Python sklearn onehotencoder

I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:

 features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]



 features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])

But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:

onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')



features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()

The error I'm receiving is:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'



TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>

I've checked if I have any missing values or any strings and I don't. All features are integers.

Thanks.

edited Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

asked Dec 27 '18 at 21:50

user3755632

144111

Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58

@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01

It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21

add a comment |

I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:

 features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]



 features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])

But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:

onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')



features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()

The error I'm receiving is:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'



TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>

I've checked if I have any missing values or any strings and I don't. All features are integers.

Thanks.

edited Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

asked Dec 27 '18 at 21:50

user3755632

144111

Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58

@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01

It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21

add a comment |

I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:

 features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]



 features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])

But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:

onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')



features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()

The error I'm receiving is:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'



TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>

I've checked if I have any missing values or any strings and I don't. All features are integers.

Thanks.

edited Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

asked Dec 27 '18 at 21:50

user3755632

144111

I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:

 features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]



 features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])

But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:

onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')



features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()

The error I'm receiving is:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'



TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>

I've checked if I have any missing values or any strings and I don't. All features are integers.

Thanks.

python scikit-learn sklearn-pandas

edited Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

asked Dec 27 '18 at 21:50

user3755632

144111

edited Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

asked Dec 27 '18 at 21:50

user3755632

144111

edited Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

edited Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

edited Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

asked Dec 27 '18 at 21:50

user3755632

144111

asked Dec 27 '18 at 21:50

user3755632

144111

asked Dec 27 '18 at 21:50

user3755632

144111

Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58

@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01

It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21

add a comment |

Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58

@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01

It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21

Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58

@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01

It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21

add a comment |

1 Answer
1

active

oldest

votes

The current OneHotEncoder in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder first to encode categories to numbers (or different numbers to unique sorted numbers as you did).

This error a bug in OneHotEncoder as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto' to the code will remove this error like this:

onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 

                              handle_unknown='ignore')

If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.

See this issue here:

https://github.com/scikit-learn/scikit-learn/issues/12881

answered Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44

@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
– Vivek Kumar
Dec 28 '18 at 13:15

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53951218%2fpython-sklearn-onehotencoder%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 

                              handle_unknown='ignore')

If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.

See this issue here:

https://github.com/scikit-learn/scikit-learn/issues/12881

answered Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44

@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
– Vivek Kumar
Dec 28 '18 at 13:15

add a comment |

onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 

                              handle_unknown='ignore')

If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.

See this issue here:

https://github.com/scikit-learn/scikit-learn/issues/12881

answered Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44

@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
– Vivek Kumar
Dec 28 '18 at 13:15

add a comment |

onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 

                              handle_unknown='ignore')

If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.

See this issue here:

https://github.com/scikit-learn/scikit-learn/issues/12881

answered Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 

                              handle_unknown='ignore')

If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.

See this issue here:

https://github.com/scikit-learn/scikit-learn/issues/12881

answered Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

answered Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

answered Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

answered Dec 28 '18 at 11:28

Vivek Kumar

15.4k41952

Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44

@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
– Vivek Kumar
Dec 28 '18 at 13:15

add a comment |

Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44

@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
– Vivek Kumar
Dec 28 '18 at 13:15

Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44

@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
– Vivek Kumar
Dec 28 '18 at 13:15

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

h,YB2ANCpGiNB29fW b,TM91jIRnXrBcskH40C 7EBll 2y UG7lhfy4,9q xFxGygjSsJUXnOD,Sc1o6FW,zWlBciTp5rZHHey

搜尋此網誌

Bdtjtk