Unable to create exactly equal data partitions using createDataPartition in R- getting 1396 and 1398...

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

-1

I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.

index = createDataPartition(final_ts$SAR,p=0.5, list = F)

final_test_data = final_ts[index,]

final_validation_data = final_ts[-index,]

This code creates two datasets with sizes 1396 and 1398 observations respectively.

I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
Thanks in advance!

edited Jan 4 at 12:23

asked Jan 4 at 10:31

Bharat Ram Ammu

589

add a comment |

-1

I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.

index = createDataPartition(final_ts$SAR,p=0.5, list = F)

final_test_data = final_ts[index,]

final_validation_data = final_ts[-index,]

This code creates two datasets with sizes 1396 and 1398 observations respectively.

I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
Thanks in advance!

edited Jan 4 at 12:23

asked Jan 4 at 10:31

Bharat Ram Ammu

589

add a comment |

-1

I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.

index = createDataPartition(final_ts$SAR,p=0.5, list = F)

final_test_data = final_ts[index,]

final_validation_data = final_ts[-index,]

This code creates two datasets with sizes 1396 and 1398 observations respectively.

I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
Thanks in advance!

edited Jan 4 at 12:23

asked Jan 4 at 10:31

Bharat Ram Ammu

589

I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.

index = createDataPartition(final_ts$SAR,p=0.5, list = F)

final_test_data = final_ts[index,]

final_validation_data = final_ts[-index,]

This code creates two datasets with sizes 1396 and 1398 observations respectively.

I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
Thanks in advance!

r data-partitioning

edited Jan 4 at 12:23

asked Jan 4 at 10:31

Bharat Ram Ammu

589

edited Jan 4 at 12:23

asked Jan 4 at 10:31

Bharat Ram Ammu

589

edited Jan 4 at 12:23

asked Jan 4 at 10:31

Bharat Ram Ammu

589

asked Jan 4 at 10:31

Bharat Ram Ammu

589

asked Jan 4 at 10:31

Bharat Ram Ammu

589

add a comment |

1 Answer
1

active

oldest

votes

It has to do with the number of cases of the response variable (final_ts$SAR in your case).

For example:

y <- rep(c(0,1), 10)

table(y)

y

0  1 

10 10 

# even number of cases

Now we split:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 10 obs 

train

0 1 

5 5 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

If we build and example instead with odd number of cases:

y <- rep(c(0,1), 11)

table(y)

y

0  1 

11 11

We have:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 12 obs.

train

0 1 

6 6 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

More info here.

answered Jan 4 at 10:49

RLave

5,36911227

Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

– Bharat Ram Ammu
Jan 4 at 10:54

1

I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

– RLave
Jan 4 at 10:56

Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

– Bharat Ram Ammu
Jan 4 at 12:07

I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

– RLave
Jan 4 at 12:18

I suggest you ask a different question where you show your data and expected output with a reproducible example.

– RLave
Jan 4 at 12:19

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54037185%2funable-to-create-exactly-equal-data-partitions-using-createdatapartition-in-r-g%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

It has to do with the number of cases of the response variable (final_ts$SAR in your case).

For example:

y <- rep(c(0,1), 10)

table(y)

y

0  1 

10 10 

# even number of cases

Now we split:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 10 obs 

train

0 1 

5 5 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

If we build and example instead with odd number of cases:

y <- rep(c(0,1), 11)

table(y)

y

0  1 

11 11

We have:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 12 obs.

train

0 1 

6 6 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

More info here.

answered Jan 4 at 10:49

RLave

5,36911227

Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

– Bharat Ram Ammu
Jan 4 at 10:54

1

I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

– RLave
Jan 4 at 10:56

Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

– Bharat Ram Ammu
Jan 4 at 12:07

I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

– RLave
Jan 4 at 12:18

I suggest you ask a different question where you show your data and expected output with a reproducible example.

– RLave
Jan 4 at 12:19

add a comment |

It has to do with the number of cases of the response variable (final_ts$SAR in your case).

For example:

y <- rep(c(0,1), 10)

table(y)

y

0  1 

10 10 

# even number of cases

Now we split:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 10 obs 

train

0 1 

5 5 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

If we build and example instead with odd number of cases:

y <- rep(c(0,1), 11)

table(y)

y

0  1 

11 11

We have:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 12 obs.

train

0 1 

6 6 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

More info here.

answered Jan 4 at 10:49

RLave

5,36911227

Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

– Bharat Ram Ammu
Jan 4 at 10:54

1

I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

– RLave
Jan 4 at 10:56

Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

– Bharat Ram Ammu
Jan 4 at 12:07

I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

– RLave
Jan 4 at 12:18

I suggest you ask a different question where you show your data and expected output with a reproducible example.

– RLave
Jan 4 at 12:19

add a comment |

It has to do with the number of cases of the response variable (final_ts$SAR in your case).

For example:

y <- rep(c(0,1), 10)

table(y)

y

0  1 

10 10 

# even number of cases

Now we split:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 10 obs 

train

0 1 

5 5 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

If we build and example instead with odd number of cases:

y <- rep(c(0,1), 11)

table(y)

y

0  1 

11 11

We have:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 12 obs.

train

0 1 

6 6 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

More info here.

answered Jan 4 at 10:49

RLave

5,36911227

It has to do with the number of cases of the response variable (final_ts$SAR in your case).

For example:

y <- rep(c(0,1), 10)

table(y)

y

0  1 

10 10 

# even number of cases

Now we split:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 10 obs 

train

0 1 

5 5 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

If we build and example instead with odd number of cases:

y <- rep(c(0,1), 11)

table(y)

y

0  1 

11 11

We have:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]

table(train) # we have 12 obs.

train

0 1 

6 6 



test <- y[-caret::createDataPartition(y, p=0.5,list=F)]

table(test) # we have 10 obs.

test

0 1 

5 5

More info here.

answered Jan 4 at 10:49

RLave

5,36911227

answered Jan 4 at 10:49

RLave

5,36911227

answered Jan 4 at 10:49

RLave

5,36911227

answered Jan 4 at 10:49

RLave

5,36911227

Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

– Bharat Ram Ammu
Jan 4 at 10:54

1

I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

– RLave
Jan 4 at 10:56

Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

– Bharat Ram Ammu
Jan 4 at 12:07

I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

– RLave
Jan 4 at 12:18

I suggest you ask a different question where you show your data and expected output with a reproducible example.

– RLave
Jan 4 at 12:19

add a comment |

Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

– Bharat Ram Ammu
Jan 4 at 10:54

1

I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

– RLave
Jan 4 at 10:56

Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

– Bharat Ram Ammu
Jan 4 at 12:07

I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

– RLave
Jan 4 at 12:18

I suggest you ask a different question where you show your data and expected output with a reproducible example.

– RLave
Jan 4 at 12:19

Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

– Bharat Ram Ammu
Jan 4 at 10:54

I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

– RLave
Jan 4 at 10:56

Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

– Bharat Ram Ammu
Jan 4 at 12:07

I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

– RLave
Jan 4 at 12:18

I suggest you ask a different question where you show your data and expected output with a reproducible example.

– RLave
Jan 4 at 12:19

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk