How to create training and test DataSetIterators in deeplearning4j?












1















I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.



All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader to read CSV files.



Then a DataSetIterator is created with the SequenceRecordReaderDataSetIterator constructor and fed into the MultiLayerNetwork.fit() or the MultiLayerNetwork.evaluate() method (depending if it's a training or test data set iterator).



However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data> and a List<Labels> objects.



How can I:



1) create the DataSetIterator from my two lists?



2) split the DataSetIterator in a training set and a test set?



Edit:



I think my question is too broad. Let me try to narrow it down.



I have started to read this article which uses a very simple approach to create a data set:



It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second) constructor.



Training the data works using network.fit(dataSet);, but I can't evaluate it while training, as the method evaluate requires an data set iterator, not a data set.



Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.



I also guess that I could create mini batches from this big data set by using the batchBy(int num) method. But this method returns a list of data sets, and not an data set iterator... iterateWithMiniBatches() does return a data set iterator but when I looked at the source file, it returns null and is deprecated. Then I tried to see if there is an implementation of the DataSetIterator I could use, but there are a lot of them. I tried the BaseDataSetIterator but it does not take a DataSet as constructor parameter but a DataSetFetcher... Yet another layer.



Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?










share|improve this question





























    1















    I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.



    All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader to read CSV files.



    Then a DataSetIterator is created with the SequenceRecordReaderDataSetIterator constructor and fed into the MultiLayerNetwork.fit() or the MultiLayerNetwork.evaluate() method (depending if it's a training or test data set iterator).



    However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data> and a List<Labels> objects.



    How can I:



    1) create the DataSetIterator from my two lists?



    2) split the DataSetIterator in a training set and a test set?



    Edit:



    I think my question is too broad. Let me try to narrow it down.



    I have started to read this article which uses a very simple approach to create a data set:



    It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second) constructor.



    Training the data works using network.fit(dataSet);, but I can't evaluate it while training, as the method evaluate requires an data set iterator, not a data set.



    Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.



    I also guess that I could create mini batches from this big data set by using the batchBy(int num) method. But this method returns a list of data sets, and not an data set iterator... iterateWithMiniBatches() does return a data set iterator but when I looked at the source file, it returns null and is deprecated. Then I tried to see if there is an implementation of the DataSetIterator I could use, but there are a lot of them. I tried the BaseDataSetIterator but it does not take a DataSet as constructor parameter but a DataSetFetcher... Yet another layer.



    Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?










    share|improve this question



























      1












      1








      1








      I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.



      All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader to read CSV files.



      Then a DataSetIterator is created with the SequenceRecordReaderDataSetIterator constructor and fed into the MultiLayerNetwork.fit() or the MultiLayerNetwork.evaluate() method (depending if it's a training or test data set iterator).



      However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data> and a List<Labels> objects.



      How can I:



      1) create the DataSetIterator from my two lists?



      2) split the DataSetIterator in a training set and a test set?



      Edit:



      I think my question is too broad. Let me try to narrow it down.



      I have started to read this article which uses a very simple approach to create a data set:



      It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second) constructor.



      Training the data works using network.fit(dataSet);, but I can't evaluate it while training, as the method evaluate requires an data set iterator, not a data set.



      Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.



      I also guess that I could create mini batches from this big data set by using the batchBy(int num) method. But this method returns a list of data sets, and not an data set iterator... iterateWithMiniBatches() does return a data set iterator but when I looked at the source file, it returns null and is deprecated. Then I tried to see if there is an implementation of the DataSetIterator I could use, but there are a lot of them. I tried the BaseDataSetIterator but it does not take a DataSet as constructor parameter but a DataSetFetcher... Yet another layer.



      Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?










      share|improve this question
















      I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.



      All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader to read CSV files.



      Then a DataSetIterator is created with the SequenceRecordReaderDataSetIterator constructor and fed into the MultiLayerNetwork.fit() or the MultiLayerNetwork.evaluate() method (depending if it's a training or test data set iterator).



      However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data> and a List<Labels> objects.



      How can I:



      1) create the DataSetIterator from my two lists?



      2) split the DataSetIterator in a training set and a test set?



      Edit:



      I think my question is too broad. Let me try to narrow it down.



      I have started to read this article which uses a very simple approach to create a data set:



      It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second) constructor.



      Training the data works using network.fit(dataSet);, but I can't evaluate it while training, as the method evaluate requires an data set iterator, not a data set.



      Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.



      I also guess that I could create mini batches from this big data set by using the batchBy(int num) method. But this method returns a list of data sets, and not an data set iterator... iterateWithMiniBatches() does return a data set iterator but when I looked at the source file, it returns null and is deprecated. Then I tried to see if there is an implementation of the DataSetIterator I could use, but there are a lot of them. I tried the BaseDataSetIterator but it does not take a DataSet as constructor parameter but a DataSetFetcher... Yet another layer.



      Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?







      deeplearning4j






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 8 at 21:59







      Ben

















      asked Jan 3 at 10:23









      BenBen

      1,75842043




      1,75842043
























          1 Answer
          1






          active

          oldest

          votes


















          1














          1)



          MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter



          If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this



              INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
          INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');

          final DataSet allData = new DataSet(x,y);

          final List<DataSet> list = allData.asList();

          ListDataSetIterator iterator = new ListDataSetIterator(list);


          For 2) you should just create two seperate iterators, one for training, one for testing.



          You can then evaluate your net with net.evaluate(testIterator);






          share|improve this answer


























          • Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

            – Ben
            Jan 16 at 20:10











          • Thats nice to know! Feel free to edit my answer

            – fkajzer
            Jan 18 at 17:08











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54020352%2fhow-to-create-training-and-test-datasetiterators-in-deeplearning4j%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          1)



          MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter



          If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this



              INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
          INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');

          final DataSet allData = new DataSet(x,y);

          final List<DataSet> list = allData.asList();

          ListDataSetIterator iterator = new ListDataSetIterator(list);


          For 2) you should just create two seperate iterators, one for training, one for testing.



          You can then evaluate your net with net.evaluate(testIterator);






          share|improve this answer


























          • Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

            – Ben
            Jan 16 at 20:10











          • Thats nice to know! Feel free to edit my answer

            – fkajzer
            Jan 18 at 17:08
















          1














          1)



          MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter



          If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this



              INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
          INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');

          final DataSet allData = new DataSet(x,y);

          final List<DataSet> list = allData.asList();

          ListDataSetIterator iterator = new ListDataSetIterator(list);


          For 2) you should just create two seperate iterators, one for training, one for testing.



          You can then evaluate your net with net.evaluate(testIterator);






          share|improve this answer


























          • Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

            – Ben
            Jan 16 at 20:10











          • Thats nice to know! Feel free to edit my answer

            – fkajzer
            Jan 18 at 17:08














          1












          1








          1







          1)



          MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter



          If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this



              INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
          INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');

          final DataSet allData = new DataSet(x,y);

          final List<DataSet> list = allData.asList();

          ListDataSetIterator iterator = new ListDataSetIterator(list);


          For 2) you should just create two seperate iterators, one for training, one for testing.



          You can then evaluate your net with net.evaluate(testIterator);






          share|improve this answer















          1)



          MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter



          If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this



              INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');
          INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');

          final DataSet allData = new DataSet(x,y);

          final List<DataSet> list = allData.asList();

          ListDataSetIterator iterator = new ListDataSetIterator(list);


          For 2) you should just create two seperate iterators, one for training, one for testing.



          You can then evaluate your net with net.evaluate(testIterator);







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 11 at 16:56

























          answered Jan 11 at 16:47









          fkajzerfkajzer

          11910




          11910













          • Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

            – Ben
            Jan 16 at 20:10











          • Thats nice to know! Feel free to edit my answer

            – fkajzer
            Jan 18 at 17:08



















          • Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

            – Ben
            Jan 16 at 20:10











          • Thats nice to know! Feel free to edit my answer

            – fkajzer
            Jan 18 at 17:08

















          Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

          – Ben
          Jan 16 at 20:10





          Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

          – Ben
          Jan 16 at 20:10













          Thats nice to know! Feel free to edit my answer

          – fkajzer
          Jan 18 at 17:08





          Thats nice to know! Feel free to edit my answer

          – fkajzer
          Jan 18 at 17:08




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54020352%2fhow-to-create-training-and-test-datasetiterators-in-deeplearning4j%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Monofisismo

          Angular Downloading a file using contenturl with Basic Authentication

          Olmecas