How to create training and test DataSetIterators in deeplearning4j?

I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.

All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader to read CSV files.

Then a DataSetIterator is created with the SequenceRecordReaderDataSetIterator constructor and fed into the MultiLayerNetwork.fit() or the MultiLayerNetwork.evaluate() method (depending if it's a training or test data set iterator).

However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data> and a List<Labels> objects.

How can I:

1) create the DataSetIterator from my two lists?

2) split the DataSetIterator in a training set and a test set?

Edit:

I think my question is too broad. Let me try to narrow it down.

I have started to read this article which uses a very simple approach to create a data set:

It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second) constructor.

Training the data works using network.fit(dataSet);, but I can't evaluate it while training, as the method evaluate requires an data set iterator, not a data set.

Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.

I also guess that I could create mini batches from this big data set by using the batchBy(int num) method. But this method returns a list of data sets, and not an data set iterator... iterateWithMiniBatches() does return a data set iterator but when I looked at the source file, it returns null and is deprecated. Then I tried to see if there is an implementation of the DataSetIterator I could use, but there are a lot of them. I tried the BaseDataSetIterator but it does not take a DataSet as constructor parameter but a DataSetFetcher... Yet another layer.

Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?

edited Jan 8 at 21:59

asked Jan 3 at 10:23

Ben

1,75842043

add a comment |

I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.

All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader to read CSV files.

However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data> and a List<Labels> objects.

How can I:

1) create the DataSetIterator from my two lists?

2) split the DataSetIterator in a training set and a test set?

Edit:

I think my question is too broad. Let me try to narrow it down.

I have started to read this article which uses a very simple approach to create a data set:

It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second) constructor.

Training the data works using network.fit(dataSet);, but I can't evaluate it while training, as the method evaluate requires an data set iterator, not a data set.

Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.

Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?

edited Jan 8 at 21:59

asked Jan 3 at 10:23

Ben

1,75842043

add a comment |

I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.

All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader to read CSV files.

However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data> and a List<Labels> objects.

How can I:

1) create the DataSetIterator from my two lists?

2) split the DataSetIterator in a training set and a test set?

Edit:

I think my question is too broad. Let me try to narrow it down.

I have started to read this article which uses a very simple approach to create a data set:

It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second) constructor.

Training the data works using network.fit(dataSet);, but I can't evaluate it while training, as the method evaluate requires an data set iterator, not a data set.

Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.

Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?

edited Jan 8 at 21:59

asked Jan 3 at 10:23

Ben

1,75842043

I am building a recurrent neural network with deeplearning4j and I need to create the training and test data sets.

All the examples provided in the documentation and the example code, use a CSVSequenceRecordReader to read CSV files.

However, in my case, the data set I have is not stored in a CSV file. I access it online through a third-party library, pre-process it to obtain a List<Data> and a List<Labels> objects.

How can I:

1) create the DataSetIterator from my two lists?

2) split the DataSetIterator in a training set and a test set?

Edit:

I think my question is too broad. Let me try to narrow it down.

I have started to read this article which uses a very simple approach to create a data set:

It creates two INDArrays and builds a DataSet from them using the DataSet(INDArray first, INDArray second) constructor.

Training the data works using network.fit(dataSet);, but I can't evaluate it while training, as the method evaluate requires an data set iterator, not a data set.

Moreover, from what I understand, using this approach also means that there is only one huge data set, no mini batches.

Is there somewhere an example that shows how to create a data set without using the default record readers? Or should I just create my how implementation of a record reader?

deeplearning4j

edited Jan 8 at 21:59

asked Jan 3 at 10:23

Ben

1,75842043

edited Jan 8 at 21:59

asked Jan 3 at 10:23

Ben

1,75842043

edited Jan 8 at 21:59

asked Jan 3 at 10:23

Ben

1,75842043

asked Jan 3 at 10:23

Ben

1,75842043

asked Jan 3 at 10:23

Ben

1,75842043

add a comment |

1 Answer
1

active

oldest

votes

MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter

If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this

    INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');

    INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');



    final DataSet allData = new DataSet(x,y);



    final List<DataSet> list = allData.asList();



    ListDataSetIterator iterator = new ListDataSetIterator(list);

For 2) you should just create two seperate iterators, one for training, one for testing.

You can then evaluate your net with net.evaluate(testIterator);

edited Jan 11 at 16:56

answered Jan 11 at 16:47

fkajzer

11910

Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

– Ben
Jan 16 at 20:10

Thats nice to know! Feel free to edit my answer

– fkajzer
Jan 18 at 17:08

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54020352%2fhow-to-create-training-and-test-datasetiterators-in-deeplearning4j%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter

If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this

    INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');

    INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');



    final DataSet allData = new DataSet(x,y);



    final List<DataSet> list = allData.asList();



    ListDataSetIterator iterator = new ListDataSetIterator(list);

For 2) you should just create two seperate iterators, one for training, one for testing.

You can then evaluate your net with net.evaluate(testIterator);

edited Jan 11 at 16:56

answered Jan 11 at 16:47

fkajzer

11910

Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

– Ben
Jan 16 at 20:10

Thats nice to know! Feel free to edit my answer

– fkajzer
Jan 18 at 17:08

add a comment |

MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter

If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this

    INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');

    INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');



    final DataSet allData = new DataSet(x,y);



    final List<DataSet> list = allData.asList();



    ListDataSetIterator iterator = new ListDataSetIterator(list);

For 2) you should just create two seperate iterators, one for training, one for testing.

You can then evaluate your net with net.evaluate(testIterator);

edited Jan 11 at 16:56

answered Jan 11 at 16:47

fkajzer

11910

Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

– Ben
Jan 16 at 20:10

Thats nice to know! Feel free to edit my answer

– fkajzer
Jan 18 at 17:08

add a comment |

MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter

If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this

    INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');

    INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');



    final DataSet allData = new DataSet(x,y);



    final List<DataSet> list = allData.asList();



    ListDataSetIterator iterator = new ListDataSetIterator(list);

For 2) you should just create two seperate iterators, one for training, one for testing.

You can then evaluate your net with net.evaluate(testIterator);

edited Jan 11 at 16:56

answered Jan 11 at 16:47

fkajzer

11910

MultiLayerNetwork.evaluate() accepts ListDataSetIterator as a parameter

If you have a List<Data> object you can first map it into a double featureVector and a double labelVector and then create a ListDataSetIterator like this

    INDArray x = Nd4j.create(featureVector, new int{featureVector.length/numberOfFeatures, numberOfFeatures}, 'c');

    INDArray y = Nd4j.create(labelVector, new int{labelVector.length/numberOfLabels, numberOfLabels}, 'c');



    final DataSet allData = new DataSet(x,y);



    final List<DataSet> list = allData.asList();



    ListDataSetIterator iterator = new ListDataSetIterator(list);

For 2) you should just create two seperate iterators, one for training, one for testing.

You can then evaluate your net with net.evaluate(testIterator);

edited Jan 11 at 16:56

answered Jan 11 at 16:47

fkajzer

11910

edited Jan 11 at 16:56

answered Jan 11 at 16:47

fkajzer

11910

answered Jan 11 at 16:47

fkajzer

11910

answered Jan 11 at 16:47

fkajzer

11910

Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

– Ben
Jan 16 at 20:10

Thats nice to know! Feel free to edit my answer

– fkajzer
Jan 18 at 17:08

add a comment |

Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

– Ben
Jan 16 at 20:10

Thats nice to know! Feel free to edit my answer

– fkajzer
Jan 18 at 17:08

Thanks for your answer. I have a precision though about allData.asList(). According to the API doc, this takes each example of the DataSet and creates a list of DataSets containing one example each (i.e. a minibatch of size one). If I want to create batches of size n, I should better use batchBy(n)

– Ben
Jan 16 at 20:10

Thats nice to know! Feel free to edit my answer

– fkajzer
Jan 18 at 17:08

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

eh Qa,2BpMP3vFPiEW9zmeW,FCrAmA0eMmZwB

搜尋此網誌

Bdtjtk