Unable to create exactly equal data partitions using createDataPartition in R- getting 1396 and 1398...





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







-1















I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.



index = createDataPartition(final_ts$SAR,p=0.5, list = F)
final_test_data = final_ts[index,]
final_validation_data = final_ts[-index,]


This code creates two datasets with sizes 1396 and 1398 observations respectively.



I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
Thanks in advance!










share|improve this question































    -1















    I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.



    index = createDataPartition(final_ts$SAR,p=0.5, list = F)
    final_test_data = final_ts[index,]
    final_validation_data = final_ts[-index,]


    This code creates two datasets with sizes 1396 and 1398 observations respectively.



    I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
    Thanks in advance!










    share|improve this question



























      -1












      -1








      -1








      I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.



      index = createDataPartition(final_ts$SAR,p=0.5, list = F)
      final_test_data = final_ts[index,]
      final_validation_data = final_ts[-index,]


      This code creates two datasets with sizes 1396 and 1398 observations respectively.



      I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
      Thanks in advance!










      share|improve this question
















      I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.



      index = createDataPartition(final_ts$SAR,p=0.5, list = F)
      final_test_data = final_ts[index,]
      final_validation_data = final_ts[-index,]


      This code creates two datasets with sizes 1396 and 1398 observations respectively.



      I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default?
      Thanks in advance!







      r data-partitioning






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 4 at 12:23







      Bharat Ram Ammu

















      asked Jan 4 at 10:31









      Bharat Ram AmmuBharat Ram Ammu

      589




      589
























          1 Answer
          1






          active

          oldest

          votes


















          2














          It has to do with the number of cases of the response variable (final_ts$SAR in your case).



          For example:



          y <- rep(c(0,1), 10)
          table(y)
          y
          0 1
          10 10
          # even number of cases


          Now we split:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 10 obs
          train
          0 1
          5 5

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          If we build and example instead with odd number of cases:



          y <- rep(c(0,1), 11)
          table(y)
          y
          0 1
          11 11


          We have:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 12 obs.
          train
          0 1
          6 6

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          More info here.






          share|improve this answer
























          • Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

            – Bharat Ram Ammu
            Jan 4 at 10:54






          • 1





            I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

            – RLave
            Jan 4 at 10:56











          • Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

            – Bharat Ram Ammu
            Jan 4 at 12:07











          • I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

            – RLave
            Jan 4 at 12:18











          • I suggest you ask a different question where you show your data and expected output with a reproducible example.

            – RLave
            Jan 4 at 12:19












          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54037185%2funable-to-create-exactly-equal-data-partitions-using-createdatapartition-in-r-g%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          It has to do with the number of cases of the response variable (final_ts$SAR in your case).



          For example:



          y <- rep(c(0,1), 10)
          table(y)
          y
          0 1
          10 10
          # even number of cases


          Now we split:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 10 obs
          train
          0 1
          5 5

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          If we build and example instead with odd number of cases:



          y <- rep(c(0,1), 11)
          table(y)
          y
          0 1
          11 11


          We have:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 12 obs.
          train
          0 1
          6 6

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          More info here.






          share|improve this answer
























          • Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

            – Bharat Ram Ammu
            Jan 4 at 10:54






          • 1





            I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

            – RLave
            Jan 4 at 10:56











          • Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

            – Bharat Ram Ammu
            Jan 4 at 12:07











          • I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

            – RLave
            Jan 4 at 12:18











          • I suggest you ask a different question where you show your data and expected output with a reproducible example.

            – RLave
            Jan 4 at 12:19
















          2














          It has to do with the number of cases of the response variable (final_ts$SAR in your case).



          For example:



          y <- rep(c(0,1), 10)
          table(y)
          y
          0 1
          10 10
          # even number of cases


          Now we split:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 10 obs
          train
          0 1
          5 5

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          If we build and example instead with odd number of cases:



          y <- rep(c(0,1), 11)
          table(y)
          y
          0 1
          11 11


          We have:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 12 obs.
          train
          0 1
          6 6

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          More info here.






          share|improve this answer
























          • Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

            – Bharat Ram Ammu
            Jan 4 at 10:54






          • 1





            I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

            – RLave
            Jan 4 at 10:56











          • Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

            – Bharat Ram Ammu
            Jan 4 at 12:07











          • I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

            – RLave
            Jan 4 at 12:18











          • I suggest you ask a different question where you show your data and expected output with a reproducible example.

            – RLave
            Jan 4 at 12:19














          2












          2








          2







          It has to do with the number of cases of the response variable (final_ts$SAR in your case).



          For example:



          y <- rep(c(0,1), 10)
          table(y)
          y
          0 1
          10 10
          # even number of cases


          Now we split:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 10 obs
          train
          0 1
          5 5

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          If we build and example instead with odd number of cases:



          y <- rep(c(0,1), 11)
          table(y)
          y
          0 1
          11 11


          We have:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 12 obs.
          train
          0 1
          6 6

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          More info here.






          share|improve this answer













          It has to do with the number of cases of the response variable (final_ts$SAR in your case).



          For example:



          y <- rep(c(0,1), 10)
          table(y)
          y
          0 1
          10 10
          # even number of cases


          Now we split:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 10 obs
          train
          0 1
          5 5

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          If we build and example instead with odd number of cases:



          y <- rep(c(0,1), 11)
          table(y)
          y
          0 1
          11 11


          We have:



          train <- y[caret::createDataPartition(y, p=0.5,list=F)]
          table(train) # we have 12 obs.
          train
          0 1
          6 6

          test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
          table(test) # we have 10 obs.
          test
          0 1
          5 5


          More info here.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 4 at 10:49









          RLaveRLave

          5,36911227




          5,36911227













          • Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

            – Bharat Ram Ammu
            Jan 4 at 10:54






          • 1





            I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

            – RLave
            Jan 4 at 10:56











          • Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

            – Bharat Ram Ammu
            Jan 4 at 12:07











          • I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

            – RLave
            Jan 4 at 12:18











          • I suggest you ask a different question where you show your data and expected output with a reproducible example.

            – RLave
            Jan 4 at 12:19



















          • Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

            – Bharat Ram Ammu
            Jan 4 at 10:54






          • 1





            I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

            – RLave
            Jan 4 at 10:56











          • Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

            – Bharat Ram Ammu
            Jan 4 at 12:07











          • I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

            – RLave
            Jan 4 at 12:18











          • I suggest you ask a different question where you show your data and expected output with a reproducible example.

            – RLave
            Jan 4 at 12:19

















          Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

          – Bharat Ram Ammu
          Jan 4 at 10:54





          Thanks for your answer, but if you add up 1396 and 1398 it is an even number and not odd. That's the reason I mentioned why can't it split into 1397 each, like it did with 10 observations splitting into 5 each and not 4 and 6 each.

          – Bharat Ram Ammu
          Jan 4 at 10:54




          1




          1





          I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

          – RLave
          Jan 4 at 10:56





          I meant the numer of cases not the number of rows in the data, like in my two examples. First we have 10-10 (even) then 11-11 (odd)

          – RLave
          Jan 4 at 10:56













          Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

          – Bharat Ram Ammu
          Jan 4 at 12:07





          Oh now I see, it was indirect explanation but got the reason why it splits into unequal halfs. But can you please help how I can split equally irrespective of the distribution of my response variable. Like in your example, how can I split with 6 0's and 4 1's in train and vice versa in test?

          – Bharat Ram Ammu
          Jan 4 at 12:07













          I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

          – RLave
          Jan 4 at 12:18





          I don't think this can be done with createDataPartition because by default it tries to balance the class distribution of y.

          – RLave
          Jan 4 at 12:18













          I suggest you ask a different question where you show your data and expected output with a reproducible example.

          – RLave
          Jan 4 at 12:19





          I suggest you ask a different question where you show your data and expected output with a reproducible example.

          – RLave
          Jan 4 at 12:19




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54037185%2funable-to-create-exactly-equal-data-partitions-using-createdatapartition-in-r-g%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Mossoró

          Error while reading .h5 file using the rhdf5 package in R

          Pushsharp Apns notification error: 'InvalidToken'