sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I have run
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
I tried using
mat[np.isfinite(mat) == True] = 0
to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?
I am using anaconda and python 2.7.9.
python python-2.7 scikit-learn valueerror
add a comment |
I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I have run
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
I tried using
mat[np.isfinite(mat) == True] = 0
to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?
I am using anaconda and python 2.7.9.
python python-2.7 scikit-learn valueerror
1
I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.
– Marcus Müller
Sep 6 '15 at 18:55
8
I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted tofloat64
it is both finite and notnan
. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.
– Owen
Dec 7 '16 at 13:52
1
For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.
– Alejandro BR
Jun 20 '18 at 21:29
add a comment |
I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I have run
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
I tried using
mat[np.isfinite(mat) == True] = 0
to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?
I am using anaconda and python 2.7.9.
python python-2.7 scikit-learn valueerror
I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I have run
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
I tried using
mat[np.isfinite(mat) == True] = 0
to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?
I am using anaconda and python 2.7.9.
python python-2.7 scikit-learn valueerror
python python-2.7 scikit-learn valueerror
edited Jun 21 '18 at 8:05
Jesse de Bruijne
2,47861327
2,47861327
asked Jul 9 '15 at 16:40
Ethan WaldieEthan Waldie
4971513
4971513
1
I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.
– Marcus Müller
Sep 6 '15 at 18:55
8
I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted tofloat64
it is both finite and notnan
. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.
– Owen
Dec 7 '16 at 13:52
1
For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.
– Alejandro BR
Jun 20 '18 at 21:29
add a comment |
1
I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.
– Marcus Müller
Sep 6 '15 at 18:55
8
I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted tofloat64
it is both finite and notnan
. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.
– Owen
Dec 7 '16 at 13:52
1
For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.
– Alejandro BR
Jun 20 '18 at 21:29
1
1
I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.
– Marcus Müller
Sep 6 '15 at 18:55
I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.
– Marcus Müller
Sep 6 '15 at 18:55
8
8
I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to
float64
it is both finite and not nan
. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.– Owen
Dec 7 '16 at 13:52
I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to
float64
it is both finite and not nan
. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.– Owen
Dec 7 '16 at 13:52
1
1
For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.
– Alejandro BR
Jun 20 '18 at 21:29
For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.
– Alejandro BR
Jun 20 '18 at 21:29
add a comment |
11 Answers
11
active
oldest
votes
This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.
EDIT: How could I miss that:
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
is obviously wrong. Right would be:
np.any(np.isnan(mat))
and
np.all(np.isfinite(mat))
You want to check wheter any of the element is NaN, and not whether the return value of the any
function is a number...
3
The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array
– Ethan Waldie
Jul 9 '15 at 17:19
3
As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.
– Marcus Müller
Jul 10 '15 at 7:54
@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)
– user2253546
Feb 23 '17 at 21:35
add a comment |
I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df
before running any sklearn code:
df = df.reset_index()
I encountered this issue many times when I removed some entries in my df
, such as
df = df[df.label=='desired_one']
This solved my error. brilliant!
– Aerin
Jan 31 '18 at 5:41
I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!
– Alexandr Kapshuk
Aug 9 '18 at 14:25
By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.
– smm
Sep 18 '18 at 18:19
add a comment |
The Dimensions of my input array were skewed, as my input csv had empty spaces.
For pandas, I just useddropna
pandas.pydata.org/pandas-docs/stable/generated/…
– FindOutIslamNow
Sep 11 '18 at 7:23
add a comment |
This is the check on which it fails:
- https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51
Which says
def _assert_all_finite(X):
"""Like assert_all_finite, but only for ndarray."""
X = np.asanyarray(X)
# First try an O(n) time, O(1) space solution for the common case that
# everything is finite; fall back to O(n) space np.isfinite to prevent
# false positives from overflow in sum method.
if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
and not np.isfinite(X).all()):
raise ValueError("Input contains NaN, infinity"
" or a value too large for %r." % X.dtype)
So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.
add a comment |
This is my function (based on this) to clean the dataset of nan
, Inf
, and missing cells (for skewed datasets):
import pandas as pd
def clean_dataset(df):
assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
df.dropna(inplace=True)
indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
return df[indices_to_keep].astype(np.float64)
Why do you drop the nan two times? First time withdropna
then a second time when dropping inf.
– luca
Jun 25 '18 at 9:04
add a comment |
I had the error after trying to select a subset of rows:
df = df.reindex(index=my_index)
Turns out that my_index
contained values that were not contained in df.index
, so the reindex function inserted some new rows and filled them with nan
.
add a comment |
I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:
X = X.as_matrix().astype(np.float)
y = y.as_matrix().astype(np.float)
this solution works perfectly for me! Thanks
– Gartmair
Nov 2 '17 at 20:46
add a comment |
With this version of python 3:
/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)
Looking at the details of the error, I found the lines of codes causing the failure:
/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
56 and not np.isfinite(X).all()):
57 raise ValueError("Input contains NaN, infinity"
---> 58 " or a value too large for %r." % X.dtype)
59
60
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)
Then with a quick and dirty loop, I was able to find that my data indeed contains nans
:
print(p[:,0].shape)
index = 0
for i in p[:,0]:
if not np.isfinite(i):
print(index, i)
index +=1
(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...
Now all I have to do is remove the values at these indexes.
add a comment |
i got the same error. it worked with df.fillna(-99999, inplace=True)
before doing any replacement, substitution etc
2
This is a dirty fix. There is a reason why your array containsnan
values; you should find it.
– Elias Strehle
Jun 25 '18 at 15:31
the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable
– user2867432
Sep 9 '18 at 21:37
add a comment |
In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.
add a comment |
If you can't find the problem in X, check in y
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31323499%2fsklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
11 Answers
11
active
oldest
votes
11 Answers
11
active
oldest
votes
active
oldest
votes
active
oldest
votes
This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.
EDIT: How could I miss that:
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
is obviously wrong. Right would be:
np.any(np.isnan(mat))
and
np.all(np.isfinite(mat))
You want to check wheter any of the element is NaN, and not whether the return value of the any
function is a number...
3
The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array
– Ethan Waldie
Jul 9 '15 at 17:19
3
As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.
– Marcus Müller
Jul 10 '15 at 7:54
@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)
– user2253546
Feb 23 '17 at 21:35
add a comment |
This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.
EDIT: How could I miss that:
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
is obviously wrong. Right would be:
np.any(np.isnan(mat))
and
np.all(np.isfinite(mat))
You want to check wheter any of the element is NaN, and not whether the return value of the any
function is a number...
3
The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array
– Ethan Waldie
Jul 9 '15 at 17:19
3
As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.
– Marcus Müller
Jul 10 '15 at 7:54
@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)
– user2253546
Feb 23 '17 at 21:35
add a comment |
This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.
EDIT: How could I miss that:
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
is obviously wrong. Right would be:
np.any(np.isnan(mat))
and
np.all(np.isfinite(mat))
You want to check wheter any of the element is NaN, and not whether the return value of the any
function is a number...
This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.
EDIT: How could I miss that:
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
is obviously wrong. Right would be:
np.any(np.isnan(mat))
and
np.all(np.isfinite(mat))
You want to check wheter any of the element is NaN, and not whether the return value of the any
function is a number...
edited Jul 10 '15 at 7:57
answered Jul 9 '15 at 16:43
Marcus MüllerMarcus Müller
23.4k32468
23.4k32468
3
The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array
– Ethan Waldie
Jul 9 '15 at 17:19
3
As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.
– Marcus Müller
Jul 10 '15 at 7:54
@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)
– user2253546
Feb 23 '17 at 21:35
add a comment |
3
The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array
– Ethan Waldie
Jul 9 '15 at 17:19
3
As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.
– Marcus Müller
Jul 10 '15 at 7:54
@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)
– user2253546
Feb 23 '17 at 21:35
3
3
The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array
– Ethan Waldie
Jul 9 '15 at 17:19
The docs dont mention anything about this error I need a way of getting rid of the infinite values from my nupy array
– Ethan Waldie
Jul 9 '15 at 17:19
3
3
As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.
– Marcus Müller
Jul 10 '15 at 7:54
As I said: They are maybe not in your input array. They might occur in the math that happens between input and magical output. The point is that all this math depends on certain conditions for the input. You have to carefully read the docs to find out whether your input satisifies these conditions.
– Marcus Müller
Jul 10 '15 at 7:54
@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)
– user2253546
Feb 23 '17 at 21:35
@MarcusMüller could you point me to the location of this document where they specify the requirements of the input matrix? I can't seem to find the "docs" you are referring to. Thank you :)
– user2253546
Feb 23 '17 at 21:35
add a comment |
I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df
before running any sklearn code:
df = df.reset_index()
I encountered this issue many times when I removed some entries in my df
, such as
df = df[df.label=='desired_one']
This solved my error. brilliant!
– Aerin
Jan 31 '18 at 5:41
I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!
– Alexandr Kapshuk
Aug 9 '18 at 14:25
By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.
– smm
Sep 18 '18 at 18:19
add a comment |
I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df
before running any sklearn code:
df = df.reset_index()
I encountered this issue many times when I removed some entries in my df
, such as
df = df[df.label=='desired_one']
This solved my error. brilliant!
– Aerin
Jan 31 '18 at 5:41
I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!
– Alexandr Kapshuk
Aug 9 '18 at 14:25
By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.
– smm
Sep 18 '18 at 18:19
add a comment |
I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df
before running any sklearn code:
df = df.reset_index()
I encountered this issue many times when I removed some entries in my df
, such as
df = df[df.label=='desired_one']
I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df
before running any sklearn code:
df = df.reset_index()
I encountered this issue many times when I removed some entries in my df
, such as
df = df[df.label=='desired_one']
edited Aug 12 '18 at 20:34
answered Dec 24 '17 at 3:43
Jun WangJun Wang
18136
18136
This solved my error. brilliant!
– Aerin
Jan 31 '18 at 5:41
I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!
– Alexandr Kapshuk
Aug 9 '18 at 14:25
By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.
– smm
Sep 18 '18 at 18:19
add a comment |
This solved my error. brilliant!
– Aerin
Jan 31 '18 at 5:41
I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!
– Alexandr Kapshuk
Aug 9 '18 at 14:25
By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.
– smm
Sep 18 '18 at 18:19
This solved my error. brilliant!
– Aerin
Jan 31 '18 at 5:41
This solved my error. brilliant!
– Aerin
Jan 31 '18 at 5:41
I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!
– Alexandr Kapshuk
Aug 9 '18 at 14:25
I love you! That's a rare instance of me finding the right solution despite not knowing what's the cause of the error!
– Alexandr Kapshuk
Aug 9 '18 at 14:25
By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.
– smm
Sep 18 '18 at 18:19
By doing the df.reset_index() it will add the "index" as a column in the resulting df. Which may not be useful for all scenario. If the df.reset_index(drop=True) ran then it will throw the same error.
– smm
Sep 18 '18 at 18:19
add a comment |
The Dimensions of my input array were skewed, as my input csv had empty spaces.
For pandas, I just useddropna
pandas.pydata.org/pandas-docs/stable/generated/…
– FindOutIslamNow
Sep 11 '18 at 7:23
add a comment |
The Dimensions of my input array were skewed, as my input csv had empty spaces.
For pandas, I just useddropna
pandas.pydata.org/pandas-docs/stable/generated/…
– FindOutIslamNow
Sep 11 '18 at 7:23
add a comment |
The Dimensions of my input array were skewed, as my input csv had empty spaces.
The Dimensions of my input array were skewed, as my input csv had empty spaces.
answered Jul 14 '15 at 21:09
Ethan WaldieEthan Waldie
4971513
4971513
For pandas, I just useddropna
pandas.pydata.org/pandas-docs/stable/generated/…
– FindOutIslamNow
Sep 11 '18 at 7:23
add a comment |
For pandas, I just useddropna
pandas.pydata.org/pandas-docs/stable/generated/…
– FindOutIslamNow
Sep 11 '18 at 7:23
For pandas, I just used
dropna
pandas.pydata.org/pandas-docs/stable/generated/…– FindOutIslamNow
Sep 11 '18 at 7:23
For pandas, I just used
dropna
pandas.pydata.org/pandas-docs/stable/generated/…– FindOutIslamNow
Sep 11 '18 at 7:23
add a comment |
This is the check on which it fails:
- https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51
Which says
def _assert_all_finite(X):
"""Like assert_all_finite, but only for ndarray."""
X = np.asanyarray(X)
# First try an O(n) time, O(1) space solution for the common case that
# everything is finite; fall back to O(n) space np.isfinite to prevent
# false positives from overflow in sum method.
if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
and not np.isfinite(X).all()):
raise ValueError("Input contains NaN, infinity"
" or a value too large for %r." % X.dtype)
So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.
add a comment |
This is the check on which it fails:
- https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51
Which says
def _assert_all_finite(X):
"""Like assert_all_finite, but only for ndarray."""
X = np.asanyarray(X)
# First try an O(n) time, O(1) space solution for the common case that
# everything is finite; fall back to O(n) space np.isfinite to prevent
# false positives from overflow in sum method.
if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
and not np.isfinite(X).all()):
raise ValueError("Input contains NaN, infinity"
" or a value too large for %r." % X.dtype)
So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.
add a comment |
This is the check on which it fails:
- https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51
Which says
def _assert_all_finite(X):
"""Like assert_all_finite, but only for ndarray."""
X = np.asanyarray(X)
# First try an O(n) time, O(1) space solution for the common case that
# everything is finite; fall back to O(n) space np.isfinite to prevent
# false positives from overflow in sum method.
if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
and not np.isfinite(X).all()):
raise ValueError("Input contains NaN, infinity"
" or a value too large for %r." % X.dtype)
So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.
This is the check on which it fails:
- https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51
Which says
def _assert_all_finite(X):
"""Like assert_all_finite, but only for ndarray."""
X = np.asanyarray(X)
# First try an O(n) time, O(1) space solution for the common case that
# everything is finite; fall back to O(n) space np.isfinite to prevent
# false positives from overflow in sum method.
if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
and not np.isfinite(X).all()):
raise ValueError("Input contains NaN, infinity"
" or a value too large for %r." % X.dtype)
So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.
answered Apr 13 '16 at 15:12
tuxdnatuxdna
5,53332647
5,53332647
add a comment |
add a comment |
This is my function (based on this) to clean the dataset of nan
, Inf
, and missing cells (for skewed datasets):
import pandas as pd
def clean_dataset(df):
assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
df.dropna(inplace=True)
indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
return df[indices_to_keep].astype(np.float64)
Why do you drop the nan two times? First time withdropna
then a second time when dropping inf.
– luca
Jun 25 '18 at 9:04
add a comment |
This is my function (based on this) to clean the dataset of nan
, Inf
, and missing cells (for skewed datasets):
import pandas as pd
def clean_dataset(df):
assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
df.dropna(inplace=True)
indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
return df[indices_to_keep].astype(np.float64)
Why do you drop the nan two times? First time withdropna
then a second time when dropping inf.
– luca
Jun 25 '18 at 9:04
add a comment |
This is my function (based on this) to clean the dataset of nan
, Inf
, and missing cells (for skewed datasets):
import pandas as pd
def clean_dataset(df):
assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
df.dropna(inplace=True)
indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
return df[indices_to_keep].astype(np.float64)
This is my function (based on this) to clean the dataset of nan
, Inf
, and missing cells (for skewed datasets):
import pandas as pd
def clean_dataset(df):
assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
df.dropna(inplace=True)
indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
return df[indices_to_keep].astype(np.float64)
answered Oct 5 '17 at 8:30
BoernBoern
2,88132950
2,88132950
Why do you drop the nan two times? First time withdropna
then a second time when dropping inf.
– luca
Jun 25 '18 at 9:04
add a comment |
Why do you drop the nan two times? First time withdropna
then a second time when dropping inf.
– luca
Jun 25 '18 at 9:04
Why do you drop the nan two times? First time with
dropna
then a second time when dropping inf.– luca
Jun 25 '18 at 9:04
Why do you drop the nan two times? First time with
dropna
then a second time when dropping inf.– luca
Jun 25 '18 at 9:04
add a comment |
I had the error after trying to select a subset of rows:
df = df.reindex(index=my_index)
Turns out that my_index
contained values that were not contained in df.index
, so the reindex function inserted some new rows and filled them with nan
.
add a comment |
I had the error after trying to select a subset of rows:
df = df.reindex(index=my_index)
Turns out that my_index
contained values that were not contained in df.index
, so the reindex function inserted some new rows and filled them with nan
.
add a comment |
I had the error after trying to select a subset of rows:
df = df.reindex(index=my_index)
Turns out that my_index
contained values that were not contained in df.index
, so the reindex function inserted some new rows and filled them with nan
.
I had the error after trying to select a subset of rows:
df = df.reindex(index=my_index)
Turns out that my_index
contained values that were not contained in df.index
, so the reindex function inserted some new rows and filled them with nan
.
edited May 7 '18 at 12:54
answered Feb 15 '18 at 16:07
Elias StrehleElias Strehle
359216
359216
add a comment |
add a comment |
I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:
X = X.as_matrix().astype(np.float)
y = y.as_matrix().astype(np.float)
this solution works perfectly for me! Thanks
– Gartmair
Nov 2 '17 at 20:46
add a comment |
I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:
X = X.as_matrix().astype(np.float)
y = y.as_matrix().astype(np.float)
this solution works perfectly for me! Thanks
– Gartmair
Nov 2 '17 at 20:46
add a comment |
I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:
X = X.as_matrix().astype(np.float)
y = y.as_matrix().astype(np.float)
I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:
X = X.as_matrix().astype(np.float)
y = y.as_matrix().astype(np.float)
answered Jul 2 '17 at 10:40
tekumaratekumara
5,14774058
5,14774058
this solution works perfectly for me! Thanks
– Gartmair
Nov 2 '17 at 20:46
add a comment |
this solution works perfectly for me! Thanks
– Gartmair
Nov 2 '17 at 20:46
this solution works perfectly for me! Thanks
– Gartmair
Nov 2 '17 at 20:46
this solution works perfectly for me! Thanks
– Gartmair
Nov 2 '17 at 20:46
add a comment |
With this version of python 3:
/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)
Looking at the details of the error, I found the lines of codes causing the failure:
/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
56 and not np.isfinite(X).all()):
57 raise ValueError("Input contains NaN, infinity"
---> 58 " or a value too large for %r." % X.dtype)
59
60
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)
Then with a quick and dirty loop, I was able to find that my data indeed contains nans
:
print(p[:,0].shape)
index = 0
for i in p[:,0]:
if not np.isfinite(i):
print(index, i)
index +=1
(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...
Now all I have to do is remove the values at these indexes.
add a comment |
With this version of python 3:
/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)
Looking at the details of the error, I found the lines of codes causing the failure:
/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
56 and not np.isfinite(X).all()):
57 raise ValueError("Input contains NaN, infinity"
---> 58 " or a value too large for %r." % X.dtype)
59
60
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)
Then with a quick and dirty loop, I was able to find that my data indeed contains nans
:
print(p[:,0].shape)
index = 0
for i in p[:,0]:
if not np.isfinite(i):
print(index, i)
index +=1
(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...
Now all I have to do is remove the values at these indexes.
add a comment |
With this version of python 3:
/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)
Looking at the details of the error, I found the lines of codes causing the failure:
/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
56 and not np.isfinite(X).all()):
57 raise ValueError("Input contains NaN, infinity"
---> 58 " or a value too large for %r." % X.dtype)
59
60
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)
Then with a quick and dirty loop, I was able to find that my data indeed contains nans
:
print(p[:,0].shape)
index = 0
for i in p[:,0]:
if not np.isfinite(i):
print(index, i)
index +=1
(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...
Now all I have to do is remove the values at these indexes.
With this version of python 3:
/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)
Looking at the details of the error, I found the lines of codes causing the failure:
/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
56 and not np.isfinite(X).all()):
57 raise ValueError("Input contains NaN, infinity"
---> 58 " or a value too large for %r." % X.dtype)
59
60
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)
Then with a quick and dirty loop, I was able to find that my data indeed contains nans
:
print(p[:,0].shape)
index = 0
for i in p[:,0]:
if not np.isfinite(i):
print(index, i)
index +=1
(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...
Now all I have to do is remove the values at these indexes.
answered Aug 10 '17 at 21:13
RaphvannsRaphvanns
37539
37539
add a comment |
add a comment |
i got the same error. it worked with df.fillna(-99999, inplace=True)
before doing any replacement, substitution etc
2
This is a dirty fix. There is a reason why your array containsnan
values; you should find it.
– Elias Strehle
Jun 25 '18 at 15:31
the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable
– user2867432
Sep 9 '18 at 21:37
add a comment |
i got the same error. it worked with df.fillna(-99999, inplace=True)
before doing any replacement, substitution etc
2
This is a dirty fix. There is a reason why your array containsnan
values; you should find it.
– Elias Strehle
Jun 25 '18 at 15:31
the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable
– user2867432
Sep 9 '18 at 21:37
add a comment |
i got the same error. it worked with df.fillna(-99999, inplace=True)
before doing any replacement, substitution etc
i got the same error. it worked with df.fillna(-99999, inplace=True)
before doing any replacement, substitution etc
answered Jun 8 '18 at 12:21
CohenCohen
369215
369215
2
This is a dirty fix. There is a reason why your array containsnan
values; you should find it.
– Elias Strehle
Jun 25 '18 at 15:31
the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable
– user2867432
Sep 9 '18 at 21:37
add a comment |
2
This is a dirty fix. There is a reason why your array containsnan
values; you should find it.
– Elias Strehle
Jun 25 '18 at 15:31
the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable
– user2867432
Sep 9 '18 at 21:37
2
2
This is a dirty fix. There is a reason why your array contains
nan
values; you should find it.– Elias Strehle
Jun 25 '18 at 15:31
This is a dirty fix. There is a reason why your array contains
nan
values; you should find it.– Elias Strehle
Jun 25 '18 at 15:31
the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable
– user2867432
Sep 9 '18 at 21:37
the data could contain nan and this gives a way to replace it with data with values that he/she finds acceptable
– user2867432
Sep 9 '18 at 21:37
add a comment |
In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.
add a comment |
In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.
add a comment |
In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.
In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.
answered Jun 25 '18 at 9:24
lucaluca
2,39122538
2,39122538
add a comment |
add a comment |
If you can't find the problem in X, check in y
add a comment |
If you can't find the problem in X, check in y
add a comment |
If you can't find the problem in X, check in y
If you can't find the problem in X, check in y
answered Dec 31 '18 at 19:52
kztdkztd
72269
72269
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31323499%2fsklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I'm voting to close this, as the author says himself that his data was invalid and though everything pointed to it, he didn't validate -- the data equivalent to a typo, which is a closing reason.
– Marcus Müller
Sep 6 '15 at 18:55
8
I had this same issue with my dataset. Ultimately: a data mistake, not a scikit learn bug. Most of the answers below are helpful but misleading. Check check check your data, make sure that when converted to
float64
it is both finite and notnan
. The error message is apt - this is almost certainly the issue for anyone who finds themselves here.– Owen
Dec 7 '16 at 13:52
1
For the record and +1 for @Owen, check your input data and make sure you do not have any missing value in any row or grid. You can use the Imputer class to avoid this problem.
– Alejandro BR
Jun 20 '18 at 21:29