Python sklearn onehotencoder
I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:
features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])
But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:
onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')
features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()
The error I'm receiving is:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>
I've checked if I have any missing values or any strings and I don't. All features are integers.
Thanks.
python scikit-learn sklearn-pandas
add a comment |
I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:
features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])
But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:
onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')
features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()
The error I'm receiving is:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>
I've checked if I have any missing values or any strings and I don't. All features are integers.
Thanks.
python scikit-learn sklearn-pandas
Where are commas?features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58
@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01
It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21
add a comment |
I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:
features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])
But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:
onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')
features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()
The error I'm receiving is:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>
I've checked if I have any missing values or any strings and I don't. All features are integers.
Thanks.
python scikit-learn sklearn-pandas
I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:
features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])
But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:
onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')
features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()
The error I'm receiving is:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>
I've checked if I have any missing values or any strings and I don't. All features are integers.
Thanks.
python scikit-learn sklearn-pandas
python scikit-learn sklearn-pandas
edited Dec 28 '18 at 11:28
Vivek Kumar
15.4k41952
15.4k41952
asked Dec 27 '18 at 21:50
user3755632
144111
144111
Where are commas?features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58
@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01
It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21
add a comment |
Where are commas?features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58
@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01
It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21
Where are commas?
features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58
Where are commas?
features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58
@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01
@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01
It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21
It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21
add a comment |
1 Answer
1
active
oldest
votes
The current OneHotEncoder
in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder
first to encode categories to numbers (or different numbers to unique sorted numbers as you did).
This error a bug in OneHotEncoder
as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto'
to the code will remove this error like this:
onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto',
handle_unknown='ignore')
If you remove the handle_unknown
parameter from your code, then also this works, but that should not be done.
See this issue here:
- https://github.com/scikit-learn/scikit-learn/issues/12881
Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44
@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look atFeatureUnion
.
– Vivek Kumar
Dec 28 '18 at 13:15
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53951218%2fpython-sklearn-onehotencoder%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The current OneHotEncoder
in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder
first to encode categories to numbers (or different numbers to unique sorted numbers as you did).
This error a bug in OneHotEncoder
as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto'
to the code will remove this error like this:
onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto',
handle_unknown='ignore')
If you remove the handle_unknown
parameter from your code, then also this works, but that should not be done.
See this issue here:
- https://github.com/scikit-learn/scikit-learn/issues/12881
Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44
@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look atFeatureUnion
.
– Vivek Kumar
Dec 28 '18 at 13:15
add a comment |
The current OneHotEncoder
in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder
first to encode categories to numbers (or different numbers to unique sorted numbers as you did).
This error a bug in OneHotEncoder
as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto'
to the code will remove this error like this:
onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto',
handle_unknown='ignore')
If you remove the handle_unknown
parameter from your code, then also this works, but that should not be done.
See this issue here:
- https://github.com/scikit-learn/scikit-learn/issues/12881
Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44
@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look atFeatureUnion
.
– Vivek Kumar
Dec 28 '18 at 13:15
add a comment |
The current OneHotEncoder
in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder
first to encode categories to numbers (or different numbers to unique sorted numbers as you did).
This error a bug in OneHotEncoder
as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto'
to the code will remove this error like this:
onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto',
handle_unknown='ignore')
If you remove the handle_unknown
parameter from your code, then also this works, but that should not be done.
See this issue here:
- https://github.com/scikit-learn/scikit-learn/issues/12881
The current OneHotEncoder
in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder
first to encode categories to numbers (or different numbers to unique sorted numbers as you did).
This error a bug in OneHotEncoder
as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto'
to the code will remove this error like this:
onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto',
handle_unknown='ignore')
If you remove the handle_unknown
parameter from your code, then also this works, but that should not be done.
See this issue here:
- https://github.com/scikit-learn/scikit-learn/issues/12881
answered Dec 28 '18 at 11:28
Vivek Kumar
15.4k41952
15.4k41952
Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44
@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look atFeatureUnion
.
– Vivek Kumar
Dec 28 '18 at 13:15
add a comment |
Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44
@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look atFeatureUnion
.
– Vivek Kumar
Dec 28 '18 at 13:15
Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44
Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44
@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at
FeatureUnion
.– Vivek Kumar
Dec 28 '18 at 13:15
@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at
FeatureUnion
.– Vivek Kumar
Dec 28 '18 at 13:15
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53951218%2fpython-sklearn-onehotencoder%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Where are commas?
features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58
@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01
It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21