Unable to create exactly equal data partitions using createDataPartition in R- getting 1396 and 1398...
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.
index = createDataPartition(final_ts$SAR,p=0.5, list = F)
final_test_data = final_ts[index,]
final_validation_data = final_ts[-index,]
This code creates two datasets with sizes 1396 and 1398 observations respectively.
I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
Thanks in advance!
r data-partitioning
add a comment |
I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.
index = createDataPartition(final_ts$SAR,p=0.5, list = F)
final_test_data = final_ts[index,]
final_validation_data = final_ts[-index,]
This code creates two datasets with sizes 1396 and 1398 observations respectively.
I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
Thanks in advance!
r data-partitioning
add a comment |
I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.
index = createDataPartition(final_ts$SAR,p=0.5, list = F)
final_test_data = final_ts[index,]
final_validation_data = final_ts[-index,]
This code creates two datasets with sizes 1396 and 1398 observations respectively.
I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
Thanks in advance!
r data-partitioning
I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.
index = createDataPartition(final_ts$SAR,p=0.5, list = F)
final_test_data = final_ts[index,]
final_validation_data = final_ts[-index,]
This code creates two datasets with sizes 1396 and 1398 observations respectively.
I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
Thanks in advance!
r data-partitioning
r data-partitioning
edited Jan 4 at 12:23
Bharat Ram Ammu
asked Jan 4 at 10:31
Bharat Ram AmmuBharat Ram Ammu
589
589
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
It has to do with the number of cases of the response variable (final_ts$SAR in your case).
For example:
y <- rep(c(0,1), 10)
table(y)
y
0 1
10 10
# even number of cases
Now we split:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 10 obs
train
0 1
5 5
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
If we build and example instead with odd number of cases:
y <- rep(c(0,1), 11)
table(y)
y
0 1
11 11
We have:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 12 obs.
train
0 1
6 6
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
More info here.
Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.
– Bharat Ram Ammu
Jan 4 at 10:54
1
I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)
– RLave
Jan 4 at 10:56
Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?
– Bharat Ram Ammu
Jan 4 at 12:07
I don't think this can be done withcreateDataPartitionbecause by default it tries to balance the class distribution ofy.
– RLave
Jan 4 at 12:18
I suggest you ask a different question where you show your data and expected output with a reproducible example.
– RLave
Jan 4 at 12:19
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54037185%2funable-to-create-exactly-equal-data-partitions-using-createdatapartition-in-r-g%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It has to do with the number of cases of the response variable (final_ts$SAR in your case).
For example:
y <- rep(c(0,1), 10)
table(y)
y
0 1
10 10
# even number of cases
Now we split:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 10 obs
train
0 1
5 5
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
If we build and example instead with odd number of cases:
y <- rep(c(0,1), 11)
table(y)
y
0 1
11 11
We have:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 12 obs.
train
0 1
6 6
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
More info here.
Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.
– Bharat Ram Ammu
Jan 4 at 10:54
1
I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)
– RLave
Jan 4 at 10:56
Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?
– Bharat Ram Ammu
Jan 4 at 12:07
I don't think this can be done withcreateDataPartitionbecause by default it tries to balance the class distribution ofy.
– RLave
Jan 4 at 12:18
I suggest you ask a different question where you show your data and expected output with a reproducible example.
– RLave
Jan 4 at 12:19
add a comment |
It has to do with the number of cases of the response variable (final_ts$SAR in your case).
For example:
y <- rep(c(0,1), 10)
table(y)
y
0 1
10 10
# even number of cases
Now we split:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 10 obs
train
0 1
5 5
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
If we build and example instead with odd number of cases:
y <- rep(c(0,1), 11)
table(y)
y
0 1
11 11
We have:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 12 obs.
train
0 1
6 6
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
More info here.
Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.
– Bharat Ram Ammu
Jan 4 at 10:54
1
I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)
– RLave
Jan 4 at 10:56
Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?
– Bharat Ram Ammu
Jan 4 at 12:07
I don't think this can be done withcreateDataPartitionbecause by default it tries to balance the class distribution ofy.
– RLave
Jan 4 at 12:18
I suggest you ask a different question where you show your data and expected output with a reproducible example.
– RLave
Jan 4 at 12:19
add a comment |
It has to do with the number of cases of the response variable (final_ts$SAR in your case).
For example:
y <- rep(c(0,1), 10)
table(y)
y
0 1
10 10
# even number of cases
Now we split:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 10 obs
train
0 1
5 5
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
If we build and example instead with odd number of cases:
y <- rep(c(0,1), 11)
table(y)
y
0 1
11 11
We have:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 12 obs.
train
0 1
6 6
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
More info here.
It has to do with the number of cases of the response variable (final_ts$SAR in your case).
For example:
y <- rep(c(0,1), 10)
table(y)
y
0 1
10 10
# even number of cases
Now we split:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 10 obs
train
0 1
5 5
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
If we build and example instead with odd number of cases:
y <- rep(c(0,1), 11)
table(y)
y
0 1
11 11
We have:
train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 12 obs.
train
0 1
6 6
test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1
5 5
More info here.
answered Jan 4 at 10:49
RLaveRLave
5,36911227
5,36911227
Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.
– Bharat Ram Ammu
Jan 4 at 10:54
1
I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)
– RLave
Jan 4 at 10:56
Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?
– Bharat Ram Ammu
Jan 4 at 12:07
I don't think this can be done withcreateDataPartitionbecause by default it tries to balance the class distribution ofy.
– RLave
Jan 4 at 12:18
I suggest you ask a different question where you show your data and expected output with a reproducible example.
– RLave
Jan 4 at 12:19
add a comment |
Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.
– Bharat Ram Ammu
Jan 4 at 10:54
1
I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)
– RLave
Jan 4 at 10:56
Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?
– Bharat Ram Ammu
Jan 4 at 12:07
I don't think this can be done withcreateDataPartitionbecause by default it tries to balance the class distribution ofy.
– RLave
Jan 4 at 12:18
I suggest you ask a different question where you show your data and expected output with a reproducible example.
– RLave
Jan 4 at 12:19
Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.
– Bharat Ram Ammu
Jan 4 at 10:54
Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.
– Bharat Ram Ammu
Jan 4 at 10:54
1
1
I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)
– RLave
Jan 4 at 10:56
I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)
– RLave
Jan 4 at 10:56
Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?
– Bharat Ram Ammu
Jan 4 at 12:07
Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?
– Bharat Ram Ammu
Jan 4 at 12:07
I don't think this can be done with
createDataPartition because by default it tries to balance the class distribution of y.– RLave
Jan 4 at 12:18
I don't think this can be done with
createDataPartition because by default it tries to balance the class distribution of y.– RLave
Jan 4 at 12:18
I suggest you ask a different question where you show your data and expected output with a reproducible example.
– RLave
Jan 4 at 12:19
I suggest you ask a different question where you show your data and expected output with a reproducible example.
– RLave
Jan 4 at 12:19
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54037185%2funable-to-create-exactly-equal-data-partitions-using-createdatapartition-in-r-g%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown