Using Standardization in sklearn pipeline





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







2















I am using Standardscaler to normalize my dataset, that is I turn each feature into a z-score, by subtracting the mean and dividing by the Std.



I would like to use Standardscaler within sklearn's pipeline and I am wondering how exactly the transformation is applied to X_test. That is, in the code below, when I run pipeline.predict(X_test), it is my understanding that the StandardScaler and SVC() is run on X_test, but what exactly does Standardscaler use as the mean and the StD? The ones from the X_Train or does it compute those only for X_test? What if, for instance X_test consists only of 2 variables, the normalization would look a lot different than if I had normalized X_train and X_test altogether, right?



steps = [('scaler', StandardScaler()),
('model',SVC())]
pipeline = Pipeline(steps)
pipeline.fit(X_train,y_train)
y_pred = pipeline.predict(X_test)









share|improve this question































    2















    I am using Standardscaler to normalize my dataset, that is I turn each feature into a z-score, by subtracting the mean and dividing by the Std.



    I would like to use Standardscaler within sklearn's pipeline and I am wondering how exactly the transformation is applied to X_test. That is, in the code below, when I run pipeline.predict(X_test), it is my understanding that the StandardScaler and SVC() is run on X_test, but what exactly does Standardscaler use as the mean and the StD? The ones from the X_Train or does it compute those only for X_test? What if, for instance X_test consists only of 2 variables, the normalization would look a lot different than if I had normalized X_train and X_test altogether, right?



    steps = [('scaler', StandardScaler()),
    ('model',SVC())]
    pipeline = Pipeline(steps)
    pipeline.fit(X_train,y_train)
    y_pred = pipeline.predict(X_test)









    share|improve this question



























      2












      2








      2








      I am using Standardscaler to normalize my dataset, that is I turn each feature into a z-score, by subtracting the mean and dividing by the Std.



      I would like to use Standardscaler within sklearn's pipeline and I am wondering how exactly the transformation is applied to X_test. That is, in the code below, when I run pipeline.predict(X_test), it is my understanding that the StandardScaler and SVC() is run on X_test, but what exactly does Standardscaler use as the mean and the StD? The ones from the X_Train or does it compute those only for X_test? What if, for instance X_test consists only of 2 variables, the normalization would look a lot different than if I had normalized X_train and X_test altogether, right?



      steps = [('scaler', StandardScaler()),
      ('model',SVC())]
      pipeline = Pipeline(steps)
      pipeline.fit(X_train,y_train)
      y_pred = pipeline.predict(X_test)









      share|improve this question
















      I am using Standardscaler to normalize my dataset, that is I turn each feature into a z-score, by subtracting the mean and dividing by the Std.



      I would like to use Standardscaler within sklearn's pipeline and I am wondering how exactly the transformation is applied to X_test. That is, in the code below, when I run pipeline.predict(X_test), it is my understanding that the StandardScaler and SVC() is run on X_test, but what exactly does Standardscaler use as the mean and the StD? The ones from the X_Train or does it compute those only for X_test? What if, for instance X_test consists only of 2 variables, the normalization would look a lot different than if I had normalized X_train and X_test altogether, right?



      steps = [('scaler', StandardScaler()),
      ('model',SVC())]
      pipeline = Pipeline(steps)
      pipeline.fit(X_train,y_train)
      y_pred = pipeline.predict(X_test)






      scikit-learn normalization pipeline






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 4 at 14:19









      desertnaut

      20.8k84579




      20.8k84579










      asked Jan 4 at 7:52









      TartagliaTartaglia

      1029




      1029
























          1 Answer
          1






          active

          oldest

          votes


















          2














          Sklearn's pipeline will apply transformer.fit_transform() when pipeline.fit() is called and transformer.transform() when pipeline.predict() is called. So for your case, StandardScaler will be fitted to X_train and then the mean and stdev from X_train will be used to scale X_test.



          The transform of X_train would indeed look different to that of X_train and X_test. The extent of the difference would depend on the extent of the difference in the distributions between X_train and X_test combined. However, if randomly partitioned from the same original dataset, and of a reasonable size, the distributions of X_train and X_test will probably be similar.



          Regardless, it is important to treat X_test as though it is out of sample, in order for it to be a (hopefully) reliable metric for unseen data. Since you don't know the distribution of unseen data, you should pretend you don't know the distribution of X_test, including the mean and stdev.






          share|improve this answer





















          • 1





            Very happy to hear that, that makes perfect sense. Thank you so much for the explanation Chris!!

            – Tartaglia
            Jan 4 at 19:55











          • @Tartaglia glad to be able to help.

            – Chris
            Jan 4 at 20:20












          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54034991%2fusing-standardization-in-sklearn-pipeline%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          Sklearn's pipeline will apply transformer.fit_transform() when pipeline.fit() is called and transformer.transform() when pipeline.predict() is called. So for your case, StandardScaler will be fitted to X_train and then the mean and stdev from X_train will be used to scale X_test.



          The transform of X_train would indeed look different to that of X_train and X_test. The extent of the difference would depend on the extent of the difference in the distributions between X_train and X_test combined. However, if randomly partitioned from the same original dataset, and of a reasonable size, the distributions of X_train and X_test will probably be similar.



          Regardless, it is important to treat X_test as though it is out of sample, in order for it to be a (hopefully) reliable metric for unseen data. Since you don't know the distribution of unseen data, you should pretend you don't know the distribution of X_test, including the mean and stdev.






          share|improve this answer





















          • 1





            Very happy to hear that, that makes perfect sense. Thank you so much for the explanation Chris!!

            – Tartaglia
            Jan 4 at 19:55











          • @Tartaglia glad to be able to help.

            – Chris
            Jan 4 at 20:20
















          2














          Sklearn's pipeline will apply transformer.fit_transform() when pipeline.fit() is called and transformer.transform() when pipeline.predict() is called. So for your case, StandardScaler will be fitted to X_train and then the mean and stdev from X_train will be used to scale X_test.



          The transform of X_train would indeed look different to that of X_train and X_test. The extent of the difference would depend on the extent of the difference in the distributions between X_train and X_test combined. However, if randomly partitioned from the same original dataset, and of a reasonable size, the distributions of X_train and X_test will probably be similar.



          Regardless, it is important to treat X_test as though it is out of sample, in order for it to be a (hopefully) reliable metric for unseen data. Since you don't know the distribution of unseen data, you should pretend you don't know the distribution of X_test, including the mean and stdev.






          share|improve this answer





















          • 1





            Very happy to hear that, that makes perfect sense. Thank you so much for the explanation Chris!!

            – Tartaglia
            Jan 4 at 19:55











          • @Tartaglia glad to be able to help.

            – Chris
            Jan 4 at 20:20














          2












          2








          2







          Sklearn's pipeline will apply transformer.fit_transform() when pipeline.fit() is called and transformer.transform() when pipeline.predict() is called. So for your case, StandardScaler will be fitted to X_train and then the mean and stdev from X_train will be used to scale X_test.



          The transform of X_train would indeed look different to that of X_train and X_test. The extent of the difference would depend on the extent of the difference in the distributions between X_train and X_test combined. However, if randomly partitioned from the same original dataset, and of a reasonable size, the distributions of X_train and X_test will probably be similar.



          Regardless, it is important to treat X_test as though it is out of sample, in order for it to be a (hopefully) reliable metric for unseen data. Since you don't know the distribution of unseen data, you should pretend you don't know the distribution of X_test, including the mean and stdev.






          share|improve this answer















          Sklearn's pipeline will apply transformer.fit_transform() when pipeline.fit() is called and transformer.transform() when pipeline.predict() is called. So for your case, StandardScaler will be fitted to X_train and then the mean and stdev from X_train will be used to scale X_test.



          The transform of X_train would indeed look different to that of X_train and X_test. The extent of the difference would depend on the extent of the difference in the distributions between X_train and X_test combined. However, if randomly partitioned from the same original dataset, and of a reasonable size, the distributions of X_train and X_test will probably be similar.



          Regardless, it is important to treat X_test as though it is out of sample, in order for it to be a (hopefully) reliable metric for unseen data. Since you don't know the distribution of unseen data, you should pretend you don't know the distribution of X_test, including the mean and stdev.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 4 at 17:05

























          answered Jan 4 at 16:59









          ChrisChris

          544414




          544414








          • 1





            Very happy to hear that, that makes perfect sense. Thank you so much for the explanation Chris!!

            – Tartaglia
            Jan 4 at 19:55











          • @Tartaglia glad to be able to help.

            – Chris
            Jan 4 at 20:20














          • 1





            Very happy to hear that, that makes perfect sense. Thank you so much for the explanation Chris!!

            – Tartaglia
            Jan 4 at 19:55











          • @Tartaglia glad to be able to help.

            – Chris
            Jan 4 at 20:20








          1




          1





          Very happy to hear that, that makes perfect sense. Thank you so much for the explanation Chris!!

          – Tartaglia
          Jan 4 at 19:55





          Very happy to hear that, that makes perfect sense. Thank you so much for the explanation Chris!!

          – Tartaglia
          Jan 4 at 19:55













          @Tartaglia glad to be able to help.

          – Chris
          Jan 4 at 20:20





          @Tartaglia glad to be able to help.

          – Chris
          Jan 4 at 20:20




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54034991%2fusing-standardization-in-sklearn-pipeline%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Monofisismo

          Angular Downloading a file using contenturl with Basic Authentication

          Olmecas