Train/Test Split Python












0














There are 250 randomly generated data points that are obtained as follows:



[X, y] = getDataSet()  # getDataSet() randomly generates 250 data points


X looks like:



[array([[-2.44141527e-01, 8.39016956e-01],
[ 1.37468561e+00, 4.97114860e-01],
[ 3.08071887e-02, -2.03260255e-01],...


While y looks like:



y is array([[0.],
[0.],
[0.],...


(it also contains 1s)



So, I'm trying to split [X, y] into training and testing sets. The training set is suppose to be a random selection of 120 of the randomly generated data points. Here is how I'm generating the training set:



nTrain = 120

maxIndex = len(X)
randomTrainingSamples = np.random.choice(maxIndex, nTrain, replace=False)
trainX = X[randomTrainingSamples, :] # training samples
trainY = y[randomTrainingSamples, :] # labels of training samples nTrain X 1


Now, what I can't seem to figure out is, how to get the testing set, which is the 130 other randomly generated data points that are not included in the training set:



testX =  # testing samples
testY = # labels of testing samples nTest x 1


Suggestions are much appreciated. Thank you!










share|improve this question



























    0














    There are 250 randomly generated data points that are obtained as follows:



    [X, y] = getDataSet()  # getDataSet() randomly generates 250 data points


    X looks like:



    [array([[-2.44141527e-01, 8.39016956e-01],
    [ 1.37468561e+00, 4.97114860e-01],
    [ 3.08071887e-02, -2.03260255e-01],...


    While y looks like:



    y is array([[0.],
    [0.],
    [0.],...


    (it also contains 1s)



    So, I'm trying to split [X, y] into training and testing sets. The training set is suppose to be a random selection of 120 of the randomly generated data points. Here is how I'm generating the training set:



    nTrain = 120

    maxIndex = len(X)
    randomTrainingSamples = np.random.choice(maxIndex, nTrain, replace=False)
    trainX = X[randomTrainingSamples, :] # training samples
    trainY = y[randomTrainingSamples, :] # labels of training samples nTrain X 1


    Now, what I can't seem to figure out is, how to get the testing set, which is the 130 other randomly generated data points that are not included in the training set:



    testX =  # testing samples
    testY = # labels of testing samples nTest x 1


    Suggestions are much appreciated. Thank you!










    share|improve this question

























      0












      0








      0







      There are 250 randomly generated data points that are obtained as follows:



      [X, y] = getDataSet()  # getDataSet() randomly generates 250 data points


      X looks like:



      [array([[-2.44141527e-01, 8.39016956e-01],
      [ 1.37468561e+00, 4.97114860e-01],
      [ 3.08071887e-02, -2.03260255e-01],...


      While y looks like:



      y is array([[0.],
      [0.],
      [0.],...


      (it also contains 1s)



      So, I'm trying to split [X, y] into training and testing sets. The training set is suppose to be a random selection of 120 of the randomly generated data points. Here is how I'm generating the training set:



      nTrain = 120

      maxIndex = len(X)
      randomTrainingSamples = np.random.choice(maxIndex, nTrain, replace=False)
      trainX = X[randomTrainingSamples, :] # training samples
      trainY = y[randomTrainingSamples, :] # labels of training samples nTrain X 1


      Now, what I can't seem to figure out is, how to get the testing set, which is the 130 other randomly generated data points that are not included in the training set:



      testX =  # testing samples
      testY = # labels of testing samples nTest x 1


      Suggestions are much appreciated. Thank you!










      share|improve this question













      There are 250 randomly generated data points that are obtained as follows:



      [X, y] = getDataSet()  # getDataSet() randomly generates 250 data points


      X looks like:



      [array([[-2.44141527e-01, 8.39016956e-01],
      [ 1.37468561e+00, 4.97114860e-01],
      [ 3.08071887e-02, -2.03260255e-01],...


      While y looks like:



      y is array([[0.],
      [0.],
      [0.],...


      (it also contains 1s)



      So, I'm trying to split [X, y] into training and testing sets. The training set is suppose to be a random selection of 120 of the randomly generated data points. Here is how I'm generating the training set:



      nTrain = 120

      maxIndex = len(X)
      randomTrainingSamples = np.random.choice(maxIndex, nTrain, replace=False)
      trainX = X[randomTrainingSamples, :] # training samples
      trainY = y[randomTrainingSamples, :] # labels of training samples nTrain X 1


      Now, what I can't seem to figure out is, how to get the testing set, which is the 130 other randomly generated data points that are not included in the training set:



      testX =  # testing samples
      testY = # labels of testing samples nTest x 1


      Suggestions are much appreciated. Thank you!







      python numpy machine-learning






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Dec 28 '18 at 5:45









      MatthewSpireMatthewSpire

      1591111




      1591111
























          3 Answers
          3






          active

          oldest

          votes


















          2














          You can try this.



          randomTestingSamples = [i for i in range(maxIndex) if i not in randomTrainingSamples]
          testX = X[randomTestingSamples, :] # testing samples
          testY = y[randomTestingSamples, :] # labels of testing samples nTest x 1





          share|improve this answer





















          • I think this worked. Thank you!
            – MatthewSpire
            Dec 28 '18 at 12:51










          • I wanted to let you know this was for an assignment and I have attempted to give credit for your help.
            – MatthewSpire
            Dec 29 '18 at 17:28



















          2














          You can use sklearn.model_selection.train_test_split:



          import numpy as np
          from sklearn.model_selection import train_test_split

          X, y = np.ndarray((250, 2)), np.ndarray((250, 1))

          trainX, testX, trainY, testY = train_test_split(X, y, test_size= 130)

          trainX.shape
          # (120, 2)
          testX.shape
          # (130, 2)
          trainY.shape
          # (120, 1)
          testY.shape
          # (130, 1)





          share|improve this answer





















          • Cannot use sklearn, otherwise I would have. Thank you!
            – MatthewSpire
            Dec 28 '18 at 12:46










          • @MatthewSpire Only numpy then?
            – Chris
            Dec 28 '18 at 12:48










          • Yes sir. I've already got the training, but I can't seem to figure out how to select the other 130 for testing.
            – MatthewSpire
            Dec 28 '18 at 12:50










          • I think feed liu got it 'cause that code seems to work.
            – MatthewSpire
            Dec 28 '18 at 12:52



















          0














          You can shuffle the index and pick the first 120 as train and the next 130 as test



          random_index = np.random.shuffle(np.arange(len(X)))
          randomTrainingSamples = random_index[:120]
          randomTestSamples = random_index[120:250]

          trainX = X[randomTrainingSamples, :]
          trainY = y[randomTrainingSamples, :]

          testX = X[randomTestSamples, :]
          testY = y[randomTestSamples, :]





          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53954167%2ftrain-test-split-python%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            You can try this.



            randomTestingSamples = [i for i in range(maxIndex) if i not in randomTrainingSamples]
            testX = X[randomTestingSamples, :] # testing samples
            testY = y[randomTestingSamples, :] # labels of testing samples nTest x 1





            share|improve this answer





















            • I think this worked. Thank you!
              – MatthewSpire
              Dec 28 '18 at 12:51










            • I wanted to let you know this was for an assignment and I have attempted to give credit for your help.
              – MatthewSpire
              Dec 29 '18 at 17:28
















            2














            You can try this.



            randomTestingSamples = [i for i in range(maxIndex) if i not in randomTrainingSamples]
            testX = X[randomTestingSamples, :] # testing samples
            testY = y[randomTestingSamples, :] # labels of testing samples nTest x 1





            share|improve this answer





















            • I think this worked. Thank you!
              – MatthewSpire
              Dec 28 '18 at 12:51










            • I wanted to let you know this was for an assignment and I have attempted to give credit for your help.
              – MatthewSpire
              Dec 29 '18 at 17:28














            2












            2








            2






            You can try this.



            randomTestingSamples = [i for i in range(maxIndex) if i not in randomTrainingSamples]
            testX = X[randomTestingSamples, :] # testing samples
            testY = y[randomTestingSamples, :] # labels of testing samples nTest x 1





            share|improve this answer












            You can try this.



            randomTestingSamples = [i for i in range(maxIndex) if i not in randomTrainingSamples]
            testX = X[randomTestingSamples, :] # testing samples
            testY = y[randomTestingSamples, :] # labels of testing samples nTest x 1






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 28 '18 at 6:02









            feed liufeed liu

            415




            415












            • I think this worked. Thank you!
              – MatthewSpire
              Dec 28 '18 at 12:51










            • I wanted to let you know this was for an assignment and I have attempted to give credit for your help.
              – MatthewSpire
              Dec 29 '18 at 17:28


















            • I think this worked. Thank you!
              – MatthewSpire
              Dec 28 '18 at 12:51










            • I wanted to let you know this was for an assignment and I have attempted to give credit for your help.
              – MatthewSpire
              Dec 29 '18 at 17:28
















            I think this worked. Thank you!
            – MatthewSpire
            Dec 28 '18 at 12:51




            I think this worked. Thank you!
            – MatthewSpire
            Dec 28 '18 at 12:51












            I wanted to let you know this was for an assignment and I have attempted to give credit for your help.
            – MatthewSpire
            Dec 29 '18 at 17:28




            I wanted to let you know this was for an assignment and I have attempted to give credit for your help.
            – MatthewSpire
            Dec 29 '18 at 17:28













            2














            You can use sklearn.model_selection.train_test_split:



            import numpy as np
            from sklearn.model_selection import train_test_split

            X, y = np.ndarray((250, 2)), np.ndarray((250, 1))

            trainX, testX, trainY, testY = train_test_split(X, y, test_size= 130)

            trainX.shape
            # (120, 2)
            testX.shape
            # (130, 2)
            trainY.shape
            # (120, 1)
            testY.shape
            # (130, 1)





            share|improve this answer





















            • Cannot use sklearn, otherwise I would have. Thank you!
              – MatthewSpire
              Dec 28 '18 at 12:46










            • @MatthewSpire Only numpy then?
              – Chris
              Dec 28 '18 at 12:48










            • Yes sir. I've already got the training, but I can't seem to figure out how to select the other 130 for testing.
              – MatthewSpire
              Dec 28 '18 at 12:50










            • I think feed liu got it 'cause that code seems to work.
              – MatthewSpire
              Dec 28 '18 at 12:52
















            2














            You can use sklearn.model_selection.train_test_split:



            import numpy as np
            from sklearn.model_selection import train_test_split

            X, y = np.ndarray((250, 2)), np.ndarray((250, 1))

            trainX, testX, trainY, testY = train_test_split(X, y, test_size= 130)

            trainX.shape
            # (120, 2)
            testX.shape
            # (130, 2)
            trainY.shape
            # (120, 1)
            testY.shape
            # (130, 1)





            share|improve this answer





















            • Cannot use sklearn, otherwise I would have. Thank you!
              – MatthewSpire
              Dec 28 '18 at 12:46










            • @MatthewSpire Only numpy then?
              – Chris
              Dec 28 '18 at 12:48










            • Yes sir. I've already got the training, but I can't seem to figure out how to select the other 130 for testing.
              – MatthewSpire
              Dec 28 '18 at 12:50










            • I think feed liu got it 'cause that code seems to work.
              – MatthewSpire
              Dec 28 '18 at 12:52














            2












            2








            2






            You can use sklearn.model_selection.train_test_split:



            import numpy as np
            from sklearn.model_selection import train_test_split

            X, y = np.ndarray((250, 2)), np.ndarray((250, 1))

            trainX, testX, trainY, testY = train_test_split(X, y, test_size= 130)

            trainX.shape
            # (120, 2)
            testX.shape
            # (130, 2)
            trainY.shape
            # (120, 1)
            testY.shape
            # (130, 1)





            share|improve this answer












            You can use sklearn.model_selection.train_test_split:



            import numpy as np
            from sklearn.model_selection import train_test_split

            X, y = np.ndarray((250, 2)), np.ndarray((250, 1))

            trainX, testX, trainY, testY = train_test_split(X, y, test_size= 130)

            trainX.shape
            # (120, 2)
            testX.shape
            # (130, 2)
            trainY.shape
            # (120, 1)
            testY.shape
            # (130, 1)






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 28 '18 at 5:47









            ChrisChris

            722211




            722211












            • Cannot use sklearn, otherwise I would have. Thank you!
              – MatthewSpire
              Dec 28 '18 at 12:46










            • @MatthewSpire Only numpy then?
              – Chris
              Dec 28 '18 at 12:48










            • Yes sir. I've already got the training, but I can't seem to figure out how to select the other 130 for testing.
              – MatthewSpire
              Dec 28 '18 at 12:50










            • I think feed liu got it 'cause that code seems to work.
              – MatthewSpire
              Dec 28 '18 at 12:52


















            • Cannot use sklearn, otherwise I would have. Thank you!
              – MatthewSpire
              Dec 28 '18 at 12:46










            • @MatthewSpire Only numpy then?
              – Chris
              Dec 28 '18 at 12:48










            • Yes sir. I've already got the training, but I can't seem to figure out how to select the other 130 for testing.
              – MatthewSpire
              Dec 28 '18 at 12:50










            • I think feed liu got it 'cause that code seems to work.
              – MatthewSpire
              Dec 28 '18 at 12:52
















            Cannot use sklearn, otherwise I would have. Thank you!
            – MatthewSpire
            Dec 28 '18 at 12:46




            Cannot use sklearn, otherwise I would have. Thank you!
            – MatthewSpire
            Dec 28 '18 at 12:46












            @MatthewSpire Only numpy then?
            – Chris
            Dec 28 '18 at 12:48




            @MatthewSpire Only numpy then?
            – Chris
            Dec 28 '18 at 12:48












            Yes sir. I've already got the training, but I can't seem to figure out how to select the other 130 for testing.
            – MatthewSpire
            Dec 28 '18 at 12:50




            Yes sir. I've already got the training, but I can't seem to figure out how to select the other 130 for testing.
            – MatthewSpire
            Dec 28 '18 at 12:50












            I think feed liu got it 'cause that code seems to work.
            – MatthewSpire
            Dec 28 '18 at 12:52




            I think feed liu got it 'cause that code seems to work.
            – MatthewSpire
            Dec 28 '18 at 12:52











            0














            You can shuffle the index and pick the first 120 as train and the next 130 as test



            random_index = np.random.shuffle(np.arange(len(X)))
            randomTrainingSamples = random_index[:120]
            randomTestSamples = random_index[120:250]

            trainX = X[randomTrainingSamples, :]
            trainY = y[randomTrainingSamples, :]

            testX = X[randomTestSamples, :]
            testY = y[randomTestSamples, :]





            share|improve this answer


























              0














              You can shuffle the index and pick the first 120 as train and the next 130 as test



              random_index = np.random.shuffle(np.arange(len(X)))
              randomTrainingSamples = random_index[:120]
              randomTestSamples = random_index[120:250]

              trainX = X[randomTrainingSamples, :]
              trainY = y[randomTrainingSamples, :]

              testX = X[randomTestSamples, :]
              testY = y[randomTestSamples, :]





              share|improve this answer
























                0












                0








                0






                You can shuffle the index and pick the first 120 as train and the next 130 as test



                random_index = np.random.shuffle(np.arange(len(X)))
                randomTrainingSamples = random_index[:120]
                randomTestSamples = random_index[120:250]

                trainX = X[randomTrainingSamples, :]
                trainY = y[randomTrainingSamples, :]

                testX = X[randomTestSamples, :]
                testY = y[randomTestSamples, :]





                share|improve this answer












                You can shuffle the index and pick the first 120 as train and the next 130 as test



                random_index = np.random.shuffle(np.arange(len(X)))
                randomTrainingSamples = random_index[:120]
                randomTestSamples = random_index[120:250]

                trainX = X[randomTrainingSamples, :]
                trainY = y[randomTrainingSamples, :]

                testX = X[randomTestSamples, :]
                testY = y[randomTestSamples, :]






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Dec 28 '18 at 5:52









                Ernest S KirubakaranErnest S Kirubakaran

                90759




                90759






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53954167%2ftrain-test-split-python%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Monofisismo

                    Angular Downloading a file using contenturl with Basic Authentication

                    Olmecas