sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')












73















I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


I have run



np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True


I tried using



mat[np.isfinite(mat) == True] = 0


to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?



I am using anaconda and python 2.7.9.










share|improve this question




















  • 1





    I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

    – Marcus Müller
    Sep 6 '15 at 18:55






  • 8





    I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

    – Owen
    Dec 7 '16 at 13:52






  • 1





    For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

    – Alejandro BR
    Jun 20 '18 at 21:29
















73















I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


I have run



np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True


I tried using



mat[np.isfinite(mat) == True] = 0


to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?



I am using anaconda and python 2.7.9.










share|improve this question




















  • 1





    I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

    – Marcus Müller
    Sep 6 '15 at 18:55






  • 8





    I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

    – Owen
    Dec 7 '16 at 13:52






  • 1





    For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

    – Alejandro BR
    Jun 20 '18 at 21:29














73












73








73


14






I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


I have run



np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True


I tried using



mat[np.isfinite(mat) == True] = 0


to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?



I am using anaconda and python 2.7.9.










share|improve this question
















I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


I have run



np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True


I tried using



mat[np.isfinite(mat) == True] = 0


to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?



I am using anaconda and python 2.7.9.







python python-2.7 scikit-learn valueerror






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 21 '18 at 8:05









Jesse de Bruijne

2,47861327




2,47861327










asked Jul 9 '15 at 16:40









Ethan WaldieEthan Waldie

4971513




4971513








  • 1





    I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

    – Marcus Müller
    Sep 6 '15 at 18:55






  • 8





    I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

    – Owen
    Dec 7 '16 at 13:52






  • 1





    For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

    – Alejandro BR
    Jun 20 '18 at 21:29














  • 1





    I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

    – Marcus Müller
    Sep 6 '15 at 18:55






  • 8





    I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

    – Owen
    Dec 7 '16 at 13:52






  • 1





    For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

    – Alejandro BR
    Jun 20 '18 at 21:29








1




1





I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

– Marcus Müller
Sep 6 '15 at 18:55





I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

– Marcus Müller
Sep 6 '15 at 18:55




8




8





I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

– Owen
Dec 7 '16 at 13:52





I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

– Owen
Dec 7 '16 at 13:52




1




1





For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

– Alejandro BR
Jun 20 '18 at 21:29





For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

– Alejandro BR
Jun 20 '18 at 21:29












11 Answers
11






active

oldest

votes


















66














This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.



EDIT: How could I miss that:



np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True


is obviously wrong. Right would be:



np.any(np.isnan(mat))


and



np.all(np.isfinite(mat))


You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...






share|improve this answer





















  • 3





    The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

    – Ethan Waldie
    Jul 9 '15 at 17:19






  • 3





    As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

    – Marcus Müller
    Jul 10 '15 at 7:54











  • @MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

    – user2253546
    Feb 23 '17 at 21:35



















12














I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:



df = df.reset_index()


I encountered this issue many times when I removed some entries in my df, such as



df = df[df.label=='desired_one']





share|improve this answer


























  • This solved my error. brilliant!

    – Aerin
    Jan 31 '18 at 5:41











  • I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

    – Alexandr Kapshuk
    Aug 9 '18 at 14:25













  • By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

    – smm
    Sep 18 '18 at 18:19



















9














The Dimensions of my input array were skewed, as my input csv had empty spaces.






share|improve this answer
























  • For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

    – FindOutIslamNow
    Sep 11 '18 at 7:23



















7














This is the check on which it fails:




  • https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51


Which says



def _assert_all_finite(X):
"""Like assert_all_finite, but only for ndarray."""
X = np.asanyarray(X)
# First try an O(n) time, O(1) space solution for the common case that
# everything is finite; fall back to O(n) space np.isfinite to prevent
# false positives from overflow in sum method.
if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
and not np.isfinite(X).all()):
raise ValueError("Input contains NaN, infinity"
" or a value too large for %r." % X.dtype)


So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.






share|improve this answer































    4














    This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):



    import pandas as pd

    def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)





    share|improve this answer
























    • Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

      – luca
      Jun 25 '18 at 9:04



















    3














    I had the error after trying to select a subset of rows:



    df = df.reindex(index=my_index)


    Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.






    share|improve this answer

































      2














      I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:



      X = X.as_matrix().astype(np.float)
      y = y.as_matrix().astype(np.float)





      share|improve this answer
























      • this solution works perfectly for me! Thanks

        – Gartmair
        Nov 2 '17 at 20:46



















      2














      With this version of python 3:



      /opt/anaconda3/bin/python --version
      Python 3.6.0 :: Anaconda 4.3.0 (64-bit)


      Looking at the details of the error, I found the lines of codes causing the failure:



      /opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
      56 and not np.isfinite(X).all()):
      57 raise ValueError("Input contains NaN, infinity"
      ---> 58 " or a value too large for %r." % X.dtype)
      59
      60

      ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


      From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)



      Then with a quick and dirty loop, I was able to find that my data indeed contains nans:



      print(p[:,0].shape)
      index = 0
      for i in p[:,0]:
      if not np.isfinite(i):
      print(index, i)
      index +=1

      (367340,)
      4454 nan
      6940 nan
      10868 nan
      12753 nan
      14855 nan
      15678 nan
      24954 nan
      30251 nan
      31108 nan
      51455 nan
      59055 nan
      ...


      Now all I have to do is remove the values at these indexes.






      share|improve this answer































        1














        i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc






        share|improve this answer



















        • 2





          This is a dirty fix. There is a reason why your array contains nan values; you should find it.

          – Elias Strehle
          Jun 25 '18 at 15:31











        • the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

          – user2867432
          Sep 9 '18 at 21:37



















        0














        In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.






        share|improve this answer































          0














          If you can't find the problem in X, check in y






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31323499%2fsklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            11 Answers
            11






            active

            oldest

            votes








            11 Answers
            11






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            66














            This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.



            EDIT: How could I miss that:



            np.isnan(mat.any()) #and gets False
            np.isfinite(mat.all()) #and gets True


            is obviously wrong. Right would be:



            np.any(np.isnan(mat))


            and



            np.all(np.isfinite(mat))


            You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...






            share|improve this answer





















            • 3





              The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

              – Ethan Waldie
              Jul 9 '15 at 17:19






            • 3





              As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

              – Marcus Müller
              Jul 10 '15 at 7:54











            • @MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

              – user2253546
              Feb 23 '17 at 21:35
















            66














            This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.



            EDIT: How could I miss that:



            np.isnan(mat.any()) #and gets False
            np.isfinite(mat.all()) #and gets True


            is obviously wrong. Right would be:



            np.any(np.isnan(mat))


            and



            np.all(np.isfinite(mat))


            You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...






            share|improve this answer





















            • 3





              The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

              – Ethan Waldie
              Jul 9 '15 at 17:19






            • 3





              As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

              – Marcus Müller
              Jul 10 '15 at 7:54











            • @MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

              – user2253546
              Feb 23 '17 at 21:35














            66












            66








            66







            This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.



            EDIT: How could I miss that:



            np.isnan(mat.any()) #and gets False
            np.isfinite(mat.all()) #and gets True


            is obviously wrong. Right would be:



            np.any(np.isnan(mat))


            and



            np.all(np.isfinite(mat))


            You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...






            share|improve this answer















            This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.



            EDIT: How could I miss that:



            np.isnan(mat.any()) #and gets False
            np.isfinite(mat.all()) #and gets True


            is obviously wrong. Right would be:



            np.any(np.isnan(mat))


            and



            np.all(np.isfinite(mat))


            You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jul 10 '15 at 7:57

























            answered Jul 9 '15 at 16:43









            Marcus MüllerMarcus Müller

            23.4k32468




            23.4k32468








            • 3





              The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

              – Ethan Waldie
              Jul 9 '15 at 17:19






            • 3





              As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

              – Marcus Müller
              Jul 10 '15 at 7:54











            • @MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

              – user2253546
              Feb 23 '17 at 21:35














            • 3





              The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

              – Ethan Waldie
              Jul 9 '15 at 17:19






            • 3





              As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

              – Marcus Müller
              Jul 10 '15 at 7:54











            • @MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

              – user2253546
              Feb 23 '17 at 21:35








            3




            3





            The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

            – Ethan Waldie
            Jul 9 '15 at 17:19





            The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

            – Ethan Waldie
            Jul 9 '15 at 17:19




            3




            3





            As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

            – Marcus Müller
            Jul 10 '15 at 7:54





            As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

            – Marcus Müller
            Jul 10 '15 at 7:54













            @MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

            – user2253546
            Feb 23 '17 at 21:35





            @MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

            – user2253546
            Feb 23 '17 at 21:35













            12














            I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:



            df = df.reset_index()


            I encountered this issue many times when I removed some entries in my df, such as



            df = df[df.label=='desired_one']





            share|improve this answer


























            • This solved my error. brilliant!

              – Aerin
              Jan 31 '18 at 5:41











            • I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

              – Alexandr Kapshuk
              Aug 9 '18 at 14:25













            • By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

              – smm
              Sep 18 '18 at 18:19
















            12














            I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:



            df = df.reset_index()


            I encountered this issue many times when I removed some entries in my df, such as



            df = df[df.label=='desired_one']





            share|improve this answer


























            • This solved my error. brilliant!

              – Aerin
              Jan 31 '18 at 5:41











            • I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

              – Alexandr Kapshuk
              Aug 9 '18 at 14:25













            • By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

              – smm
              Sep 18 '18 at 18:19














            12












            12








            12







            I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:



            df = df.reset_index()


            I encountered this issue many times when I removed some entries in my df, such as



            df = df[df.label=='desired_one']





            share|improve this answer















            I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:



            df = df.reset_index()


            I encountered this issue many times when I removed some entries in my df, such as



            df = df[df.label=='desired_one']






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Aug 12 '18 at 20:34

























            answered Dec 24 '17 at 3:43









            Jun WangJun Wang

            18136




            18136













            • This solved my error. brilliant!

              – Aerin
              Jan 31 '18 at 5:41











            • I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

              – Alexandr Kapshuk
              Aug 9 '18 at 14:25













            • By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

              – smm
              Sep 18 '18 at 18:19



















            • This solved my error. brilliant!

              – Aerin
              Jan 31 '18 at 5:41











            • I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

              – Alexandr Kapshuk
              Aug 9 '18 at 14:25













            • By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

              – smm
              Sep 18 '18 at 18:19

















            This solved my error. brilliant!

            – Aerin
            Jan 31 '18 at 5:41





            This solved my error. brilliant!

            – Aerin
            Jan 31 '18 at 5:41













            I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

            – Alexandr Kapshuk
            Aug 9 '18 at 14:25







            I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

            – Alexandr Kapshuk
            Aug 9 '18 at 14:25















            By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

            – smm
            Sep 18 '18 at 18:19





            By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

            – smm
            Sep 18 '18 at 18:19











            9














            The Dimensions of my input array were skewed, as my input csv had empty spaces.






            share|improve this answer
























            • For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

              – FindOutIslamNow
              Sep 11 '18 at 7:23
















            9














            The Dimensions of my input array were skewed, as my input csv had empty spaces.






            share|improve this answer
























            • For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

              – FindOutIslamNow
              Sep 11 '18 at 7:23














            9












            9








            9







            The Dimensions of my input array were skewed, as my input csv had empty spaces.






            share|improve this answer













            The Dimensions of my input array were skewed, as my input csv had empty spaces.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jul 14 '15 at 21:09









            Ethan WaldieEthan Waldie

            4971513




            4971513













            • For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

              – FindOutIslamNow
              Sep 11 '18 at 7:23



















            • For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

              – FindOutIslamNow
              Sep 11 '18 at 7:23

















            For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

            – FindOutIslamNow
            Sep 11 '18 at 7:23





            For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

            – FindOutIslamNow
            Sep 11 '18 at 7:23











            7














            This is the check on which it fails:




            • https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51


            Which says



            def _assert_all_finite(X):
            """Like assert_all_finite, but only for ndarray."""
            X = np.asanyarray(X)
            # First try an O(n) time, O(1) space solution for the common case that
            # everything is finite; fall back to O(n) space np.isfinite to prevent
            # false positives from overflow in sum method.
            if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
            and not np.isfinite(X).all()):
            raise ValueError("Input contains NaN, infinity"
            " or a value too large for %r." % X.dtype)


            So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.






            share|improve this answer




























              7














              This is the check on which it fails:




              • https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51


              Which says



              def _assert_all_finite(X):
              """Like assert_all_finite, but only for ndarray."""
              X = np.asanyarray(X)
              # First try an O(n) time, O(1) space solution for the common case that
              # everything is finite; fall back to O(n) space np.isfinite to prevent
              # false positives from overflow in sum method.
              if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
              and not np.isfinite(X).all()):
              raise ValueError("Input contains NaN, infinity"
              " or a value too large for %r." % X.dtype)


              So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.






              share|improve this answer


























                7












                7








                7







                This is the check on which it fails:




                • https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51


                Which says



                def _assert_all_finite(X):
                """Like assert_all_finite, but only for ndarray."""
                X = np.asanyarray(X)
                # First try an O(n) time, O(1) space solution for the common case that
                # everything is finite; fall back to O(n) space np.isfinite to prevent
                # false positives from overflow in sum method.
                if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
                and not np.isfinite(X).all()):
                raise ValueError("Input contains NaN, infinity"
                " or a value too large for %r." % X.dtype)


                So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.






                share|improve this answer













                This is the check on which it fails:




                • https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51


                Which says



                def _assert_all_finite(X):
                """Like assert_all_finite, but only for ndarray."""
                X = np.asanyarray(X)
                # First try an O(n) time, O(1) space solution for the common case that
                # everything is finite; fall back to O(n) space np.isfinite to prevent
                # false positives from overflow in sum method.
                if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
                and not np.isfinite(X).all()):
                raise ValueError("Input contains NaN, infinity"
                " or a value too large for %r." % X.dtype)


                So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 13 '16 at 15:12









                tuxdnatuxdna

                5,53332647




                5,53332647























                    4














                    This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):



                    import pandas as pd

                    def clean_dataset(df):
                    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
                    df.dropna(inplace=True)
                    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
                    return df[indices_to_keep].astype(np.float64)





                    share|improve this answer
























                    • Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

                      – luca
                      Jun 25 '18 at 9:04
















                    4














                    This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):



                    import pandas as pd

                    def clean_dataset(df):
                    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
                    df.dropna(inplace=True)
                    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
                    return df[indices_to_keep].astype(np.float64)





                    share|improve this answer
























                    • Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

                      – luca
                      Jun 25 '18 at 9:04














                    4












                    4








                    4







                    This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):



                    import pandas as pd

                    def clean_dataset(df):
                    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
                    df.dropna(inplace=True)
                    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
                    return df[indices_to_keep].astype(np.float64)





                    share|improve this answer













                    This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):



                    import pandas as pd

                    def clean_dataset(df):
                    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
                    df.dropna(inplace=True)
                    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
                    return df[indices_to_keep].astype(np.float64)






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Oct 5 '17 at 8:30









                    BoernBoern

                    2,88132950




                    2,88132950













                    • Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

                      – luca
                      Jun 25 '18 at 9:04



















                    • Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

                      – luca
                      Jun 25 '18 at 9:04

















                    Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

                    – luca
                    Jun 25 '18 at 9:04





                    Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

                    – luca
                    Jun 25 '18 at 9:04











                    3














                    I had the error after trying to select a subset of rows:



                    df = df.reindex(index=my_index)


                    Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.






                    share|improve this answer






























                      3














                      I had the error after trying to select a subset of rows:



                      df = df.reindex(index=my_index)


                      Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.






                      share|improve this answer




























                        3












                        3








                        3







                        I had the error after trying to select a subset of rows:



                        df = df.reindex(index=my_index)


                        Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.






                        share|improve this answer















                        I had the error after trying to select a subset of rows:



                        df = df.reindex(index=my_index)


                        Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited May 7 '18 at 12:54

























                        answered Feb 15 '18 at 16:07









                        Elias StrehleElias Strehle

                        359216




                        359216























                            2














                            I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:



                            X = X.as_matrix().astype(np.float)
                            y = y.as_matrix().astype(np.float)





                            share|improve this answer
























                            • this solution works perfectly for me! Thanks

                              – Gartmair
                              Nov 2 '17 at 20:46
















                            2














                            I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:



                            X = X.as_matrix().astype(np.float)
                            y = y.as_matrix().astype(np.float)





                            share|improve this answer
























                            • this solution works perfectly for me! Thanks

                              – Gartmair
                              Nov 2 '17 at 20:46














                            2












                            2








                            2







                            I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:



                            X = X.as_matrix().astype(np.float)
                            y = y.as_matrix().astype(np.float)





                            share|improve this answer













                            I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:



                            X = X.as_matrix().astype(np.float)
                            y = y.as_matrix().astype(np.float)






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Jul 2 '17 at 10:40









                            tekumaratekumara

                            5,14774058




                            5,14774058













                            • this solution works perfectly for me! Thanks

                              – Gartmair
                              Nov 2 '17 at 20:46



















                            • this solution works perfectly for me! Thanks

                              – Gartmair
                              Nov 2 '17 at 20:46

















                            this solution works perfectly for me! Thanks

                            – Gartmair
                            Nov 2 '17 at 20:46





                            this solution works perfectly for me! Thanks

                            – Gartmair
                            Nov 2 '17 at 20:46











                            2














                            With this version of python 3:



                            /opt/anaconda3/bin/python --version
                            Python 3.6.0 :: Anaconda 4.3.0 (64-bit)


                            Looking at the details of the error, I found the lines of codes causing the failure:



                            /opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
                            56 and not np.isfinite(X).all()):
                            57 raise ValueError("Input contains NaN, infinity"
                            ---> 58 " or a value too large for %r." % X.dtype)
                            59
                            60

                            ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


                            From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)



                            Then with a quick and dirty loop, I was able to find that my data indeed contains nans:



                            print(p[:,0].shape)
                            index = 0
                            for i in p[:,0]:
                            if not np.isfinite(i):
                            print(index, i)
                            index +=1

                            (367340,)
                            4454 nan
                            6940 nan
                            10868 nan
                            12753 nan
                            14855 nan
                            15678 nan
                            24954 nan
                            30251 nan
                            31108 nan
                            51455 nan
                            59055 nan
                            ...


                            Now all I have to do is remove the values at these indexes.






                            share|improve this answer




























                              2














                              With this version of python 3:



                              /opt/anaconda3/bin/python --version
                              Python 3.6.0 :: Anaconda 4.3.0 (64-bit)


                              Looking at the details of the error, I found the lines of codes causing the failure:



                              /opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
                              56 and not np.isfinite(X).all()):
                              57 raise ValueError("Input contains NaN, infinity"
                              ---> 58 " or a value too large for %r." % X.dtype)
                              59
                              60

                              ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


                              From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)



                              Then with a quick and dirty loop, I was able to find that my data indeed contains nans:



                              print(p[:,0].shape)
                              index = 0
                              for i in p[:,0]:
                              if not np.isfinite(i):
                              print(index, i)
                              index +=1

                              (367340,)
                              4454 nan
                              6940 nan
                              10868 nan
                              12753 nan
                              14855 nan
                              15678 nan
                              24954 nan
                              30251 nan
                              31108 nan
                              51455 nan
                              59055 nan
                              ...


                              Now all I have to do is remove the values at these indexes.






                              share|improve this answer


























                                2












                                2








                                2







                                With this version of python 3:



                                /opt/anaconda3/bin/python --version
                                Python 3.6.0 :: Anaconda 4.3.0 (64-bit)


                                Looking at the details of the error, I found the lines of codes causing the failure:



                                /opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
                                56 and not np.isfinite(X).all()):
                                57 raise ValueError("Input contains NaN, infinity"
                                ---> 58 " or a value too large for %r." % X.dtype)
                                59
                                60

                                ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


                                From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)



                                Then with a quick and dirty loop, I was able to find that my data indeed contains nans:



                                print(p[:,0].shape)
                                index = 0
                                for i in p[:,0]:
                                if not np.isfinite(i):
                                print(index, i)
                                index +=1

                                (367340,)
                                4454 nan
                                6940 nan
                                10868 nan
                                12753 nan
                                14855 nan
                                15678 nan
                                24954 nan
                                30251 nan
                                31108 nan
                                51455 nan
                                59055 nan
                                ...


                                Now all I have to do is remove the values at these indexes.






                                share|improve this answer













                                With this version of python 3:



                                /opt/anaconda3/bin/python --version
                                Python 3.6.0 :: Anaconda 4.3.0 (64-bit)


                                Looking at the details of the error, I found the lines of codes causing the failure:



                                /opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
                                56 and not np.isfinite(X).all()):
                                57 raise ValueError("Input contains NaN, infinity"
                                ---> 58 " or a value too large for %r." % X.dtype)
                                59
                                60

                                ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


                                From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)



                                Then with a quick and dirty loop, I was able to find that my data indeed contains nans:



                                print(p[:,0].shape)
                                index = 0
                                for i in p[:,0]:
                                if not np.isfinite(i):
                                print(index, i)
                                index +=1

                                (367340,)
                                4454 nan
                                6940 nan
                                10868 nan
                                12753 nan
                                14855 nan
                                15678 nan
                                24954 nan
                                30251 nan
                                31108 nan
                                51455 nan
                                59055 nan
                                ...


                                Now all I have to do is remove the values at these indexes.







                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered Aug 10 '17 at 21:13









                                RaphvannsRaphvanns

                                37539




                                37539























                                    1














                                    i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc






                                    share|improve this answer



















                                    • 2





                                      This is a dirty fix. There is a reason why your array contains nan values; you should find it.

                                      – Elias Strehle
                                      Jun 25 '18 at 15:31











                                    • the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

                                      – user2867432
                                      Sep 9 '18 at 21:37
















                                    1














                                    i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc






                                    share|improve this answer



















                                    • 2





                                      This is a dirty fix. There is a reason why your array contains nan values; you should find it.

                                      – Elias Strehle
                                      Jun 25 '18 at 15:31











                                    • the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

                                      – user2867432
                                      Sep 9 '18 at 21:37














                                    1












                                    1








                                    1







                                    i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc






                                    share|improve this answer













                                    i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Jun 8 '18 at 12:21









                                    CohenCohen

                                    369215




                                    369215








                                    • 2





                                      This is a dirty fix. There is a reason why your array contains nan values; you should find it.

                                      – Elias Strehle
                                      Jun 25 '18 at 15:31











                                    • the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

                                      – user2867432
                                      Sep 9 '18 at 21:37














                                    • 2





                                      This is a dirty fix. There is a reason why your array contains nan values; you should find it.

                                      – Elias Strehle
                                      Jun 25 '18 at 15:31











                                    • the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

                                      – user2867432
                                      Sep 9 '18 at 21:37








                                    2




                                    2





                                    This is a dirty fix. There is a reason why your array contains nan values; you should find it.

                                    – Elias Strehle
                                    Jun 25 '18 at 15:31





                                    This is a dirty fix. There is a reason why your array contains nan values; you should find it.

                                    – Elias Strehle
                                    Jun 25 '18 at 15:31













                                    the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

                                    – user2867432
                                    Sep 9 '18 at 21:37





                                    the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

                                    – user2867432
                                    Sep 9 '18 at 21:37











                                    0














                                    In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.






                                    share|improve this answer




























                                      0














                                      In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.






                                      share|improve this answer


























                                        0












                                        0








                                        0







                                        In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.






                                        share|improve this answer













                                        In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.







                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Jun 25 '18 at 9:24









                                        lucaluca

                                        2,39122538




                                        2,39122538























                                            0














                                            If you can't find the problem in X, check in y






                                            share|improve this answer




























                                              0














                                              If you can't find the problem in X, check in y






                                              share|improve this answer


























                                                0












                                                0








                                                0







                                                If you can't find the problem in X, check in y






                                                share|improve this answer













                                                If you can't find the problem in X, check in y







                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered Dec 31 '18 at 19:52









                                                kztdkztd

                                                72269




                                                72269






























                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Stack Overflow!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31323499%2fsklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Monofisismo

                                                    Angular Downloading a file using contenturl with Basic Authentication

                                                    Olmecas