How to create training and test DataSetIterators in deeplearning4j?
I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.
All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader
to read CSV files.
Then a DataSetIterator
is created with the SequenceRecordReaderDataSetIterator
constructor and fed into the MultiLayerNetwork.fit()
or the MultiLayerNetwork.evaluate()
method (depending if it's a training or test data set iterator).
However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data>
and a List<Labels>
objects.
How can I:
1) create the DataSetIterator
from my two lists?
2) split the DataSetIterator
in a training set and a test set?
Edit:
I think my question is too broad. Let me try to narrow it down.
I have started to read this article which uses a very simple approach to create a data set:
It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second)
constructor.
Training the data works using network.fit(dataSet);
, but I can't evaluate it while training, as the method evaluate
requires an data set iterator, not a data set.
Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.
I also guess that I could create mini batches from this big data set by using the batchBy(int num)
method. But this method returns a list of data sets, and not an data set iterator... iterateWithMiniBatches() does return a data set iterator but when I looked at the source file, it returns null and is deprecated. Then I tried to see if there is an implementation of the DataSetIterator I could use, but there are a lot of them. I tried the BaseDataSetIterator but it does not take a DataSet as constructor parameter but a DataSetFetcher... Yet another layer.
Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?
deeplearning4j
add a comment |
I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.
All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader
to read CSV files.
Then a DataSetIterator
is created with the SequenceRecordReaderDataSetIterator
constructor and fed into the MultiLayerNetwork.fit()
or the MultiLayerNetwork.evaluate()
method (depending if it's a training or test data set iterator).
However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data>
and a List<Labels>
objects.
How can I:
1) create the DataSetIterator
from my two lists?
2) split the DataSetIterator
in a training set and a test set?
Edit:
I think my question is too broad. Let me try to narrow it down.
I have started to read this article which uses a very simple approach to create a data set:
It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second)
constructor.
Training the data works using network.fit(dataSet);
, but I can't evaluate it while training, as the method evaluate
requires an data set iterator, not a data set.
Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.
I also guess that I could create mini batches from this big data set by using the batchBy(int num)
method. But this method returns a list of data sets, and not an data set iterator... iterateWithMiniBatches() does return a data set iterator but when I looked at the source file, it returns null and is deprecated. Then I tried to see if there is an implementation of the DataSetIterator I could use, but there are a lot of them. I tried the BaseDataSetIterator but it does not take a DataSet as constructor parameter but a DataSetFetcher... Yet another layer.
Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?
deeplearning4j
add a comment |
I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.
All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader
to read CSV files.
Then a DataSetIterator
is created with the SequenceRecordReaderDataSetIterator
constructor and fed into the MultiLayerNetwork.fit()
or the MultiLayerNetwork.evaluate()
method (depending if it's a training or test data set iterator).
However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data>
and a List<Labels>
objects.
How can I:
1) create the DataSetIterator
from my two lists?
2) split the DataSetIterator
in a training set and a test set?
Edit:
I think my question is too broad. Let me try to narrow it down.
I have started to read this article which uses a very simple approach to create a data set:
It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second)
constructor.
Training the data works using network.fit(dataSet);
, but I can't evaluate it while training, as the method evaluate
requires an data set iterator, not a data set.
Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.
I also guess that I could create mini batches from this big data set by using the batchBy(int num)
method. But this method returns a list of data sets, and not an data set iterator... iterateWithMiniBatches() does return a data set iterator but when I looked at the source file, it returns null and is deprecated. Then I tried to see if there is an implementation of the DataSetIterator I could use, but there are a lot of them. I tried the BaseDataSetIterator but it does not take a DataSet as constructor parameter but a DataSetFetcher... Yet another layer.
Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?
deeplearning4j
I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.
All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader
to read CSV files.
Then a DataSetIterator
is created with the SequenceRecordReaderDataSetIterator
constructor and fed into the MultiLayerNetwork.fit()
or the MultiLayerNetwork.evaluate()
method (depending if it's a training or test data set iterator).
However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data>
and a List<Labels>
objects.
How can I:
1) create the DataSetIterator
from my two lists?
2) split the DataSetIterator
in a training set and a test set?
Edit:
I think my question is too broad. Let me try to narrow it down.
I have started to read this article which uses a very simple approach to create a data set:
It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second)
constructor.
Training the data works using network.fit(dataSet);
, but I can't evaluate it while training, as the method evaluate
requires an data set iterator, not a data set.
Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.
I also guess that I could create mini batches from this big data set by using the batchBy(int num)
method. But this method returns a list of data sets, and not an data set iterator... iterateWithMiniBatches() does return a data set iterator but when I looked at the source file, it returns null and is deprecated. Then I tried to see if there is an implementation of the DataSetIterator I could use, but there are a lot of them. I tried the BaseDataSetIterator but it does not take a DataSet as constructor parameter but a DataSetFetcher... Yet another layer.
Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?
deeplearning4j
deeplearning4j
edited Jan 8 at 21:59
Ben
asked Jan 3 at 10:23
BenBen
1,75842043
1,75842043
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
1)
MultiLayerNetwork.evaluate()
accepts ListDataSetIterator
as a parameter
If you have a List<Data> object
you can first map it into a double featureVector
and a double labelVector
and then create a ListDataSetIterator
like this
INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');
final DataSet allData = new DataSet(x,y);
final List<DataSet> list = allData.asList();
ListDataSetIterator iterator = new ListDataSetIterator(list);
For 2) you should just create two seperate iterators, one for training, one for testing.
You can then evaluate your net with net.evaluate(testIterator);
Thanks for your answer. I have a precision though aboutallData.asList()
. According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better usebatchBy(n)
– Ben
Jan 16 at 20:10
Thats nice to know! Feel free to edit my answer
– fkajzer
Jan 18 at 17:08
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54020352%2fhow-to-create-training-and-test-datasetiterators-in-deeplearning4j%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
1)
MultiLayerNetwork.evaluate()
accepts ListDataSetIterator
as a parameter
If you have a List<Data> object
you can first map it into a double featureVector
and a double labelVector
and then create a ListDataSetIterator
like this
INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');
final DataSet allData = new DataSet(x,y);
final List<DataSet> list = allData.asList();
ListDataSetIterator iterator = new ListDataSetIterator(list);
For 2) you should just create two seperate iterators, one for training, one for testing.
You can then evaluate your net with net.evaluate(testIterator);
Thanks for your answer. I have a precision though aboutallData.asList()
. According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better usebatchBy(n)
– Ben
Jan 16 at 20:10
Thats nice to know! Feel free to edit my answer
– fkajzer
Jan 18 at 17:08
add a comment |
1)
MultiLayerNetwork.evaluate()
accepts ListDataSetIterator
as a parameter
If you have a List<Data> object
you can first map it into a double featureVector
and a double labelVector
and then create a ListDataSetIterator
like this
INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');
final DataSet allData = new DataSet(x,y);
final List<DataSet> list = allData.asList();
ListDataSetIterator iterator = new ListDataSetIterator(list);
For 2) you should just create two seperate iterators, one for training, one for testing.
You can then evaluate your net with net.evaluate(testIterator);
Thanks for your answer. I have a precision though aboutallData.asList()
. According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better usebatchBy(n)
– Ben
Jan 16 at 20:10
Thats nice to know! Feel free to edit my answer
– fkajzer
Jan 18 at 17:08
add a comment |
1)
MultiLayerNetwork.evaluate()
accepts ListDataSetIterator
as a parameter
If you have a List<Data> object
you can first map it into a double featureVector
and a double labelVector
and then create a ListDataSetIterator
like this
INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');
final DataSet allData = new DataSet(x,y);
final List<DataSet> list = allData.asList();
ListDataSetIterator iterator = new ListDataSetIterator(list);
For 2) you should just create two seperate iterators, one for training, one for testing.
You can then evaluate your net with net.evaluate(testIterator);
1)
MultiLayerNetwork.evaluate()
accepts ListDataSetIterator
as a parameter
If you have a List<Data> object
you can first map it into a double featureVector
and a double labelVector
and then create a ListDataSetIterator
like this
INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');
final DataSet allData = new DataSet(x,y);
final List<DataSet> list = allData.asList();
ListDataSetIterator iterator = new ListDataSetIterator(list);
For 2) you should just create two seperate iterators, one for training, one for testing.
You can then evaluate your net with net.evaluate(testIterator);
edited Jan 11 at 16:56
answered Jan 11 at 16:47
fkajzerfkajzer
11910
11910
Thanks for your answer. I have a precision though aboutallData.asList()
. According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better usebatchBy(n)
– Ben
Jan 16 at 20:10
Thats nice to know! Feel free to edit my answer
– fkajzer
Jan 18 at 17:08
add a comment |
Thanks for your answer. I have a precision though aboutallData.asList()
. According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better usebatchBy(n)
– Ben
Jan 16 at 20:10
Thats nice to know! Feel free to edit my answer
– fkajzer
Jan 18 at 17:08
Thanks for your answer. I have a precision though about
allData.asList()
. According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)
– Ben
Jan 16 at 20:10
Thanks for your answer. I have a precision though about
allData.asList()
. According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)
– Ben
Jan 16 at 20:10
Thats nice to know! Feel free to edit my answer
– fkajzer
Jan 18 at 17:08
Thats nice to know! Feel free to edit my answer
– fkajzer
Jan 18 at 17:08
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54020352%2fhow-to-create-training-and-test-datasetiterators-in-deeplearning4j%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown