sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I have run

np.isnan(mat.any()) #and gets False

np.isfinite(mat.all()) #and gets True

I tried using

mat[np.isfinite(mat) == True] = 0

to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?

I am using anaconda and python 2.7.9.

edited Jun 21 '18 at 8:05

Jesse de Bruijne

2,47861327

asked Jul 9 '15 at 16:40

Ethan Waldie

4971513

1

I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

– Marcus Müller
Sep 6 '15 at 18:55

8

I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

– Owen
Dec 7 '16 at 13:52

1

For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

– Alejandro BR
Jun 20 '18 at 21:29

add a comment |

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I have run

np.isnan(mat.any()) #and gets False

np.isfinite(mat.all()) #and gets True

I tried using

mat[np.isfinite(mat) == True] = 0

to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?

I am using anaconda and python 2.7.9.

edited Jun 21 '18 at 8:05

Jesse de Bruijne

2,47861327

asked Jul 9 '15 at 16:40

Ethan Waldie

4971513

1

I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

– Marcus Müller
Sep 6 '15 at 18:55

8

I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

– Owen
Dec 7 '16 at 13:52

1

For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

– Alejandro BR
Jun 20 '18 at 21:29

add a comment |

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I have run

np.isnan(mat.any()) #and gets False

np.isfinite(mat.all()) #and gets True

I tried using

mat[np.isfinite(mat) == True] = 0

to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?

I am using anaconda and python 2.7.9.

edited Jun 21 '18 at 8:05

Jesse de Bruijne

2,47861327

asked Jul 9 '15 at 16:40

Ethan Waldie

4971513

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I have run

np.isnan(mat.any()) #and gets False

np.isfinite(mat.all()) #and gets True

I tried using

mat[np.isfinite(mat) == True] = 0

to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?

I am using anaconda and python 2.7.9.

python python-2.7 scikit-learn valueerror

edited Jun 21 '18 at 8:05

Jesse de Bruijne

2,47861327

asked Jul 9 '15 at 16:40

Ethan Waldie

4971513

edited Jun 21 '18 at 8:05

Jesse de Bruijne

2,47861327

asked Jul 9 '15 at 16:40

Ethan Waldie

4971513

edited Jun 21 '18 at 8:05

Jesse de Bruijne

2,47861327

edited Jun 21 '18 at 8:05

Jesse de Bruijne

2,47861327

edited Jun 21 '18 at 8:05

Jesse de Bruijne

2,47861327

asked Jul 9 '15 at 16:40

Ethan Waldie

4971513

asked Jul 9 '15 at 16:40

Ethan Waldie

4971513

asked Jul 9 '15 at 16:40

Ethan Waldie

4971513

1

I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

– Marcus Müller
Sep 6 '15 at 18:55

8

I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

– Owen
Dec 7 '16 at 13:52

1

For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

– Alejandro BR
Jun 20 '18 at 21:29

add a comment |

1

I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

– Marcus Müller
Sep 6 '15 at 18:55

8

I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

– Owen
Dec 7 '16 at 13:52

1

For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

– Alejandro BR
Jun 20 '18 at 21:29

I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.

– Marcus Müller
Sep 6 '15 at 18:55

I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to float64 it is both finite and not nan. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.

– Owen
Dec 7 '16 at 13:52

For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.

– Alejandro BR
Jun 20 '18 at 21:29

add a comment |

11 Answers
11

active

oldest

votes

This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False

np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:

np.any(np.isnan(mat))

and

np.all(np.isfinite(mat))

You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...

edited Jul 10 '15 at 7:57

answered Jul 9 '15 at 16:43

Marcus Müller

23.4k32468

3

The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

– Ethan Waldie
Jul 9 '15 at 17:19

3

As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

– Marcus Müller
Jul 10 '15 at 7:54

@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

– user2253546
Feb 23 '17 at 21:35

add a comment |

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

I encountered this issue many times when I removed some entries in my df, such as

df = df[df.label=='desired_one']

edited Aug 12 '18 at 20:34

answered Dec 24 '17 at 3:43

Jun Wang

18136

This solved my error. brilliant!

– Aerin
Jan 31 '18 at 5:41

I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

– Alexandr Kapshuk
Aug 9 '18 at 14:25

By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

– smm
Sep 18 '18 at 18:19

add a comment |

The Dimensions of my input array were skewed, as my input csv had empty spaces.

answered Jul 14 '15 at 21:09

Ethan Waldie

4971513

For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

– FindOutIslamNow
Sep 11 '18 at 7:23

add a comment |

This is the check on which it fails:

https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51

Which says

def _assert_all_finite(X):

    """Like assert_all_finite, but only for ndarray."""

    X = np.asanyarray(X)

    # First try an O(n) time, O(1) space solution for the common case that

    # everything is finite; fall back to O(n) space np.isfinite to prevent

    # false positives from overflow in sum method.

    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())

            and not np.isfinite(X).all()):

        raise ValueError("Input contains NaN, infinity"

                         " or a value too large for %r." % X.dtype)

So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.

answered Apr 13 '16 at 15:12

tuxdna

5,53332647

add a comment |

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):

import pandas as pd



def clean_dataset(df):

    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"

    df.dropna(inplace=True)

    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)

    return df[indices_to_keep].astype(np.float64)

answered Oct 5 '17 at 8:30

Boern

2,88132950

Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

– luca
Jun 25 '18 at 9:04

add a comment |

I had the error after trying to select a subset of rows:

df = df.reindex(index=my_index)

Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.

edited May 7 '18 at 12:54

answered Feb 15 '18 at 16:07

Elias Strehle

359216

add a comment |

I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:

X = X.as_matrix().astype(np.float)

y = y.as_matrix().astype(np.float)

answered Jul 2 '17 at 10:40

tekumara

5,14774058

this solution works perfectly for me! Thanks

– Gartmair
Nov 2 '17 at 20:46

add a comment |

With this version of python 3:

/opt/anaconda3/bin/python --version

Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

Looking at the details of the error, I found the lines of codes causing the failure:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)

     56             and not np.isfinite(X).all()):

     57         raise ValueError("Input contains NaN, infinity"

---> 58                          " or a value too large for %r." % X.dtype)

     59 

     60 



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)

Then with a quick and dirty loop, I was able to find that my data indeed contains nans:

print(p[:,0].shape)

index = 0

for i in p[:,0]:

    if not np.isfinite(i):

        print(index, i)

    index +=1



(367340,)

4454 nan

6940 nan

10868 nan

12753 nan

14855 nan

15678 nan

24954 nan

30251 nan

31108 nan

51455 nan

59055 nan

...

Now all I have to do is remove the values at these indexes.

answered Aug 10 '17 at 21:13

Raphvanns

37539

add a comment |

i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc

answered Jun 8 '18 at 12:21

Cohen

369215

2

This is a dirty fix. There is a reason why your array contains nan values; you should find it.

– Elias Strehle
Jun 25 '18 at 15:31

the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

– user2867432
Sep 9 '18 at 21:37

add a comment |

In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.

answered Jun 25 '18 at 9:24

luca

2,39122538

add a comment |

If you can't find the problem in X, check in y

answered Dec 31 '18 at 19:52

kztd

72269

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31323499%2fsklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

11 Answers
11

active

oldest

votes

11 Answers
11

active

oldest

votes

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False

np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:

np.any(np.isnan(mat))

and

np.all(np.isfinite(mat))

You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...

edited Jul 10 '15 at 7:57

answered Jul 9 '15 at 16:43

Marcus Müller

23.4k32468

3

The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

– Ethan Waldie
Jul 9 '15 at 17:19

3

As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

– Marcus Müller
Jul 10 '15 at 7:54

@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

– user2253546
Feb 23 '17 at 21:35

add a comment |

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False

np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:

np.any(np.isnan(mat))

and

np.all(np.isfinite(mat))

You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...

edited Jul 10 '15 at 7:57

answered Jul 9 '15 at 16:43

Marcus Müller

23.4k32468

3

The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

– Ethan Waldie
Jul 9 '15 at 17:19

3

As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

– Marcus Müller
Jul 10 '15 at 7:54

@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

– user2253546
Feb 23 '17 at 21:35

add a comment |

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False

np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:

np.any(np.isnan(mat))

and

np.all(np.isfinite(mat))

You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...

edited Jul 10 '15 at 7:57

answered Jul 9 '15 at 16:43

Marcus Müller

23.4k32468

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False

np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:

np.any(np.isnan(mat))

and

np.all(np.isfinite(mat))

You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...

edited Jul 10 '15 at 7:57

answered Jul 9 '15 at 16:43

Marcus Müller

23.4k32468

edited Jul 10 '15 at 7:57

answered Jul 9 '15 at 16:43

Marcus Müller

23.4k32468

answered Jul 9 '15 at 16:43

Marcus Müller

23.4k32468

answered Jul 9 '15 at 16:43

Marcus Müller

23.4k32468

3

The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

– Ethan Waldie
Jul 9 '15 at 17:19

3

As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

– Marcus Müller
Jul 10 '15 at 7:54

@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

– user2253546
Feb 23 '17 at 21:35

add a comment |

3

The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

– Ethan Waldie
Jul 9 '15 at 17:19

3

As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

– Marcus Müller
Jul 10 '15 at 7:54

@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

– user2253546
Feb 23 '17 at 21:35

The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array

– Ethan Waldie
Jul 9 '15 at 17:19

As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.

– Marcus Müller
Jul 10 '15 at 7:54

@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)

– user2253546
Feb 23 '17 at 21:35

add a comment |

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

I encountered this issue many times when I removed some entries in my df, such as

df = df[df.label=='desired_one']

edited Aug 12 '18 at 20:34

answered Dec 24 '17 at 3:43

Jun Wang

18136

This solved my error. brilliant!

– Aerin
Jan 31 '18 at 5:41

I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

– Alexandr Kapshuk
Aug 9 '18 at 14:25

By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

– smm
Sep 18 '18 at 18:19

add a comment |

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

I encountered this issue many times when I removed some entries in my df, such as

df = df[df.label=='desired_one']

edited Aug 12 '18 at 20:34

answered Dec 24 '17 at 3:43

Jun Wang

18136

This solved my error. brilliant!

– Aerin
Jan 31 '18 at 5:41

I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

– Alexandr Kapshuk
Aug 9 '18 at 14:25

By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

– smm
Sep 18 '18 at 18:19

add a comment |

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

I encountered this issue many times when I removed some entries in my df, such as

df = df[df.label=='desired_one']

edited Aug 12 '18 at 20:34

answered Dec 24 '17 at 3:43

Jun Wang

18136

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

I encountered this issue many times when I removed some entries in my df, such as

df = df[df.label=='desired_one']

edited Aug 12 '18 at 20:34

answered Dec 24 '17 at 3:43

Jun Wang

18136

edited Aug 12 '18 at 20:34

answered Dec 24 '17 at 3:43

Jun Wang

18136

answered Dec 24 '17 at 3:43

Jun Wang

18136

answered Dec 24 '17 at 3:43

Jun Wang

18136

This solved my error. brilliant!

– Aerin
Jan 31 '18 at 5:41

I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

– Alexandr Kapshuk
Aug 9 '18 at 14:25

By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

– smm
Sep 18 '18 at 18:19

add a comment |

This solved my error. brilliant!

– Aerin
Jan 31 '18 at 5:41

I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

– Alexandr Kapshuk
Aug 9 '18 at 14:25

By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

– smm
Sep 18 '18 at 18:19

This solved my error. brilliant!

– Aerin
Jan 31 '18 at 5:41

I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!

– Alexandr Kapshuk
Aug 9 '18 at 14:25

By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.

– smm
Sep 18 '18 at 18:19

add a comment |

The Dimensions of my input array were skewed, as my input csv had empty spaces.

answered Jul 14 '15 at 21:09

Ethan Waldie

4971513

For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

– FindOutIslamNow
Sep 11 '18 at 7:23

add a comment |

The Dimensions of my input array were skewed, as my input csv had empty spaces.

answered Jul 14 '15 at 21:09

Ethan Waldie

4971513

For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

– FindOutIslamNow
Sep 11 '18 at 7:23

add a comment |

The Dimensions of my input array were skewed, as my input csv had empty spaces.

answered Jul 14 '15 at 21:09

Ethan Waldie

4971513

The Dimensions of my input array were skewed, as my input csv had empty spaces.

answered Jul 14 '15 at 21:09

Ethan Waldie

4971513

answered Jul 14 '15 at 21:09

Ethan Waldie

4971513

answered Jul 14 '15 at 21:09

Ethan Waldie

4971513

answered Jul 14 '15 at 21:09

Ethan Waldie

4971513

For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

– FindOutIslamNow
Sep 11 '18 at 7:23

add a comment |

For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

– FindOutIslamNow
Sep 11 '18 at 7:23

For pandas, I just used dropna pandas.pydata.org/pandas-docs/stable/generated/…

– FindOutIslamNow
Sep 11 '18 at 7:23

add a comment |

This is the check on which it fails:

https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51

Which says

def _assert_all_finite(X):

    """Like assert_all_finite, but only for ndarray."""

    X = np.asanyarray(X)

    # First try an O(n) time, O(1) space solution for the common case that

    # everything is finite; fall back to O(n) space np.isfinite to prevent

    # false positives from overflow in sum method.

    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())

            and not np.isfinite(X).all()):

        raise ValueError("Input contains NaN, infinity"

                         " or a value too large for %r." % X.dtype)

So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.

answered Apr 13 '16 at 15:12

tuxdna

5,53332647

add a comment |

This is the check on which it fails:

https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51

Which says

def _assert_all_finite(X):

    """Like assert_all_finite, but only for ndarray."""

    X = np.asanyarray(X)

    # First try an O(n) time, O(1) space solution for the common case that

    # everything is finite; fall back to O(n) space np.isfinite to prevent

    # false positives from overflow in sum method.

    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())

            and not np.isfinite(X).all()):

        raise ValueError("Input contains NaN, infinity"

                         " or a value too large for %r." % X.dtype)

So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.

answered Apr 13 '16 at 15:12

tuxdna

5,53332647

add a comment |

This is the check on which it fails:

https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51

Which says

def _assert_all_finite(X):

    """Like assert_all_finite, but only for ndarray."""

    X = np.asanyarray(X)

    # First try an O(n) time, O(1) space solution for the common case that

    # everything is finite; fall back to O(n) space np.isfinite to prevent

    # false positives from overflow in sum method.

    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())

            and not np.isfinite(X).all()):

        raise ValueError("Input contains NaN, infinity"

                         " or a value too large for %r." % X.dtype)

So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.

answered Apr 13 '16 at 15:12

tuxdna

5,53332647

This is the check on which it fails:

https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51

Which says

def _assert_all_finite(X):

    """Like assert_all_finite, but only for ndarray."""

    X = np.asanyarray(X)

    # First try an O(n) time, O(1) space solution for the common case that

    # everything is finite; fall back to O(n) space np.isfinite to prevent

    # false positives from overflow in sum method.

    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())

            and not np.isfinite(X).all()):

        raise ValueError("Input contains NaN, infinity"

                         " or a value too large for %r." % X.dtype)

So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.

answered Apr 13 '16 at 15:12

tuxdna

5,53332647

answered Apr 13 '16 at 15:12

tuxdna

5,53332647

answered Apr 13 '16 at 15:12

tuxdna

5,53332647

answered Apr 13 '16 at 15:12

tuxdna

5,53332647

add a comment |

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):

import pandas as pd



def clean_dataset(df):

    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"

    df.dropna(inplace=True)

    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)

    return df[indices_to_keep].astype(np.float64)

answered Oct 5 '17 at 8:30

Boern

2,88132950

Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

– luca
Jun 25 '18 at 9:04

add a comment |

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):

import pandas as pd



def clean_dataset(df):

    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"

    df.dropna(inplace=True)

    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)

    return df[indices_to_keep].astype(np.float64)

answered Oct 5 '17 at 8:30

Boern

2,88132950

Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

– luca
Jun 25 '18 at 9:04

add a comment |

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):

import pandas as pd



def clean_dataset(df):

    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"

    df.dropna(inplace=True)

    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)

    return df[indices_to_keep].astype(np.float64)

answered Oct 5 '17 at 8:30

Boern

2,88132950

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):

import pandas as pd



def clean_dataset(df):

    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"

    df.dropna(inplace=True)

    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)

    return df[indices_to_keep].astype(np.float64)

answered Oct 5 '17 at 8:30

Boern

2,88132950

answered Oct 5 '17 at 8:30

Boern

2,88132950

answered Oct 5 '17 at 8:30

Boern

2,88132950

answered Oct 5 '17 at 8:30

Boern

2,88132950

Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

– luca
Jun 25 '18 at 9:04

add a comment |

Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

– luca
Jun 25 '18 at 9:04

Why do you drop the nan two times? First time with dropna then a second time when dropping inf.

– luca
Jun 25 '18 at 9:04

add a comment |

I had the error after trying to select a subset of rows:

df = df.reindex(index=my_index)

Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.

edited May 7 '18 at 12:54

answered Feb 15 '18 at 16:07

Elias Strehle

359216

add a comment |

I had the error after trying to select a subset of rows:

df = df.reindex(index=my_index)

Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.

edited May 7 '18 at 12:54

answered Feb 15 '18 at 16:07

Elias Strehle

359216

add a comment |

I had the error after trying to select a subset of rows:

df = df.reindex(index=my_index)

Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.

edited May 7 '18 at 12:54

answered Feb 15 '18 at 16:07

Elias Strehle

359216

I had the error after trying to select a subset of rows:

df = df.reindex(index=my_index)

Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.

edited May 7 '18 at 12:54

answered Feb 15 '18 at 16:07

Elias Strehle

359216

edited May 7 '18 at 12:54

answered Feb 15 '18 at 16:07

Elias Strehle

359216

answered Feb 15 '18 at 16:07

Elias Strehle

359216

answered Feb 15 '18 at 16:07

Elias Strehle

359216

add a comment |

I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:

X = X.as_matrix().astype(np.float)

y = y.as_matrix().astype(np.float)

answered Jul 2 '17 at 10:40

tekumara

5,14774058

this solution works perfectly for me! Thanks

– Gartmair
Nov 2 '17 at 20:46

add a comment |

I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:

X = X.as_matrix().astype(np.float)

y = y.as_matrix().astype(np.float)

answered Jul 2 '17 at 10:40

tekumara

5,14774058

this solution works perfectly for me! Thanks

– Gartmair
Nov 2 '17 at 20:46

add a comment |

I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:

X = X.as_matrix().astype(np.float)

y = y.as_matrix().astype(np.float)

answered Jul 2 '17 at 10:40

tekumara

5,14774058

I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:

X = X.as_matrix().astype(np.float)

y = y.as_matrix().astype(np.float)

answered Jul 2 '17 at 10:40

tekumara

5,14774058

answered Jul 2 '17 at 10:40

tekumara

5,14774058

answered Jul 2 '17 at 10:40

tekumara

5,14774058

answered Jul 2 '17 at 10:40

tekumara

5,14774058

this solution works perfectly for me! Thanks

– Gartmair
Nov 2 '17 at 20:46

add a comment |

this solution works perfectly for me! Thanks

– Gartmair
Nov 2 '17 at 20:46

this solution works perfectly for me! Thanks

– Gartmair
Nov 2 '17 at 20:46

add a comment |

With this version of python 3:

/opt/anaconda3/bin/python --version

Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

Looking at the details of the error, I found the lines of codes causing the failure:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)

     56             and not np.isfinite(X).all()):

     57         raise ValueError("Input contains NaN, infinity"

---> 58                          " or a value too large for %r." % X.dtype)

     59 

     60 



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)

Then with a quick and dirty loop, I was able to find that my data indeed contains nans:

print(p[:,0].shape)

index = 0

for i in p[:,0]:

    if not np.isfinite(i):

        print(index, i)

    index +=1



(367340,)

4454 nan

6940 nan

10868 nan

12753 nan

14855 nan

15678 nan

24954 nan

30251 nan

31108 nan

51455 nan

59055 nan

...

Now all I have to do is remove the values at these indexes.

answered Aug 10 '17 at 21:13

Raphvanns

37539

add a comment |

With this version of python 3:

/opt/anaconda3/bin/python --version

Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

Looking at the details of the error, I found the lines of codes causing the failure:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)

     56             and not np.isfinite(X).all()):

     57         raise ValueError("Input contains NaN, infinity"

---> 58                          " or a value too large for %r." % X.dtype)

     59 

     60 



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)

Then with a quick and dirty loop, I was able to find that my data indeed contains nans:

print(p[:,0].shape)

index = 0

for i in p[:,0]:

    if not np.isfinite(i):

        print(index, i)

    index +=1



(367340,)

4454 nan

6940 nan

10868 nan

12753 nan

14855 nan

15678 nan

24954 nan

30251 nan

31108 nan

51455 nan

59055 nan

...

Now all I have to do is remove the values at these indexes.

answered Aug 10 '17 at 21:13

Raphvanns

37539

add a comment |

With this version of python 3:

/opt/anaconda3/bin/python --version

Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

Looking at the details of the error, I found the lines of codes causing the failure:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)

     56             and not np.isfinite(X).all()):

     57         raise ValueError("Input contains NaN, infinity"

---> 58                          " or a value too large for %r." % X.dtype)

     59 

     60 



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)

Then with a quick and dirty loop, I was able to find that my data indeed contains nans:

print(p[:,0].shape)

index = 0

for i in p[:,0]:

    if not np.isfinite(i):

        print(index, i)

    index +=1



(367340,)

4454 nan

6940 nan

10868 nan

12753 nan

14855 nan

15678 nan

24954 nan

30251 nan

31108 nan

51455 nan

59055 nan

...

Now all I have to do is remove the values at these indexes.

answered Aug 10 '17 at 21:13

Raphvanns

37539

With this version of python 3:

/opt/anaconda3/bin/python --version

Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

Looking at the details of the error, I found the lines of codes causing the failure:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)

     56             and not np.isfinite(X).all()):

     57         raise ValueError("Input contains NaN, infinity"

---> 58                          " or a value too large for %r." % X.dtype)

     59 

     60 



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)

Then with a quick and dirty loop, I was able to find that my data indeed contains nans:

print(p[:,0].shape)

index = 0

for i in p[:,0]:

    if not np.isfinite(i):

        print(index, i)

    index +=1



(367340,)

4454 nan

6940 nan

10868 nan

12753 nan

14855 nan

15678 nan

24954 nan

30251 nan

31108 nan

51455 nan

59055 nan

...

Now all I have to do is remove the values at these indexes.

answered Aug 10 '17 at 21:13

Raphvanns

37539

answered Aug 10 '17 at 21:13

Raphvanns

37539

answered Aug 10 '17 at 21:13

Raphvanns

37539

answered Aug 10 '17 at 21:13

Raphvanns

37539

add a comment |

i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc

answered Jun 8 '18 at 12:21

Cohen

369215

2

This is a dirty fix. There is a reason why your array contains nan values; you should find it.

– Elias Strehle
Jun 25 '18 at 15:31

the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

– user2867432
Sep 9 '18 at 21:37

add a comment |

i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc

answered Jun 8 '18 at 12:21

Cohen

369215

2

This is a dirty fix. There is a reason why your array contains nan values; you should find it.

– Elias Strehle
Jun 25 '18 at 15:31

the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

– user2867432
Sep 9 '18 at 21:37

add a comment |

i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc

answered Jun 8 '18 at 12:21

Cohen

369215

i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc

answered Jun 8 '18 at 12:21

Cohen

369215

answered Jun 8 '18 at 12:21

Cohen

369215

answered Jun 8 '18 at 12:21

Cohen

369215

answered Jun 8 '18 at 12:21

Cohen

369215

2

This is a dirty fix. There is a reason why your array contains nan values; you should find it.

– Elias Strehle
Jun 25 '18 at 15:31

the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

– user2867432
Sep 9 '18 at 21:37

add a comment |

2

This is a dirty fix. There is a reason why your array contains nan values; you should find it.

– Elias Strehle
Jun 25 '18 at 15:31

the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

– user2867432
Sep 9 '18 at 21:37

This is a dirty fix. There is a reason why your array contains nan values; you should find it.

– Elias Strehle
Jun 25 '18 at 15:31

the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable

– user2867432
Sep 9 '18 at 21:37

add a comment |

answered Jun 25 '18 at 9:24

luca

2,39122538

add a comment |

answered Jun 25 '18 at 9:24

luca

2,39122538

add a comment |

answered Jun 25 '18 at 9:24

luca

2,39122538

answered Jun 25 '18 at 9:24

luca

2,39122538

answered Jun 25 '18 at 9:24

luca

2,39122538

answered Jun 25 '18 at 9:24

luca

2,39122538

answered Jun 25 '18 at 9:24

luca

2,39122538

add a comment |

If you can't find the problem in X, check in y

answered Dec 31 '18 at 19:52

kztd

72269

add a comment |

If you can't find the problem in X, check in y

answered Dec 31 '18 at 19:52

kztd

72269

add a comment |

If you can't find the problem in X, check in y

answered Dec 31 '18 at 19:52

kztd

72269

If you can't find the problem in X, check in y

answered Dec 31 '18 at 19:52

kztd

72269

answered Dec 31 '18 at 19:52

kztd

72269

answered Dec 31 '18 at 19:52

kztd

72269

answered Dec 31 '18 at 19:52

kztd

72269

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

k5hlaPbcrL Oe9lDC,fvLu7W 0COqwfOPhgE5 mCnV OYIAPSw 3TH6z1lpwxNrMY SMoZsWalynjd8oj,lnkjvaZX,SksMXSgf

搜尋此網誌

Bdtjtk