Python sklearn onehotencoder












0














I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:



 features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]

features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])


But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:



onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')

features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()


The error I'm receiving is:



TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>


I've checked if I have any missing values or any strings and I don't. All features are integers.



Thanks.










share|improve this question
























  • Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
    – Konstantin
    Dec 27 '18 at 22:58










  • @Konstantin it's a numpy array. Do numpy arrays have commas?
    – user3755632
    Dec 27 '18 at 23:01










  • It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
    – Vivek Kumar
    Dec 28 '18 at 10:21
















0














I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:



 features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]

features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])


But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:



onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')

features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()


The error I'm receiving is:



TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>


I've checked if I have any missing values or any strings and I don't. All features are integers.



Thanks.










share|improve this question
























  • Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
    – Konstantin
    Dec 27 '18 at 22:58










  • @Konstantin it's a numpy array. Do numpy arrays have commas?
    – user3755632
    Dec 27 '18 at 23:01










  • It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
    – Vivek Kumar
    Dec 28 '18 at 10:21














0












0








0







I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:



 features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]

features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])


But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:



onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')

features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()


The error I'm receiving is:



TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>


I've checked if I have any missing values or any strings and I don't. All features are integers.



Thanks.










share|improve this question















I'm trying to encode categorical data for the 4th feature of my vector which is in a numpy array. The categories are either '4' or '6'. I can change them into binary by using this:



 features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]

features_in_training_set[:,4] = LabelEncoder().fit_transform(features_in_training_set[:,4])


But, of course, I need to change this so that the classifier doesn't think that '4' is greater than '6'. However, when I run the following:



onehotencoder = OneHotEncoder(categorical_features=[4], handle_unknown='ignore')

features_in_training_set = onehotencoder.fit_transform(features_in_training_set).toarray()


The error I'm receiving is:



TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>


I've checked if I have any missing values or any strings and I don't. All features are integers.



Thanks.







python scikit-learn sklearn-pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 28 '18 at 11:28









Vivek Kumar

15.4k41952




15.4k41952










asked Dec 27 '18 at 21:50









user3755632

144111




144111












  • Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
    – Konstantin
    Dec 27 '18 at 22:58










  • @Konstantin it's a numpy array. Do numpy arrays have commas?
    – user3755632
    Dec 27 '18 at 23:01










  • It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
    – Vivek Kumar
    Dec 28 '18 at 10:21


















  • Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
    – Konstantin
    Dec 27 '18 at 22:58










  • @Konstantin it's a numpy array. Do numpy arrays have commas?
    – user3755632
    Dec 27 '18 at 23:01










  • It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
    – Vivek Kumar
    Dec 28 '18 at 10:21
















Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58




Where are commas? features_in_training_set = [[0 0 0 0 4], [0 0 0 0 4], [0 0 0 0 6],[0 0 0 0 4],[0 0 0 0 6]]
– Konstantin
Dec 27 '18 at 22:58












@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01




@Konstantin it's a numpy array. Do numpy arrays have commas?
– user3755632
Dec 27 '18 at 23:01












It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21




It will be numpy array after loading. Where is it converted to numpy array? When you write the code like this, its a list of lists which should have commas.
– Vivek Kumar
Dec 28 '18 at 10:21












1 Answer
1






active

oldest

votes


















1














The current OneHotEncoder in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder first to encode categories to numbers (or different numbers to unique sorted numbers as you did).



This error a bug in OneHotEncoder as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto' to the code will remove this error like this:



onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 
handle_unknown='ignore')


If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.



See this issue here:




  • https://github.com/scikit-learn/scikit-learn/issues/12881






share|improve this answer





















  • Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
    – user3755632
    Dec 28 '18 at 12:44










  • @user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
    – Vivek Kumar
    Dec 28 '18 at 13:15











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53951218%2fpython-sklearn-onehotencoder%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














The current OneHotEncoder in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder first to encode categories to numbers (or different numbers to unique sorted numbers as you did).



This error a bug in OneHotEncoder as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto' to the code will remove this error like this:



onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 
handle_unknown='ignore')


If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.



See this issue here:




  • https://github.com/scikit-learn/scikit-learn/issues/12881






share|improve this answer





















  • Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
    – user3755632
    Dec 28 '18 at 12:44










  • @user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
    – Vivek Kumar
    Dec 28 '18 at 13:15
















1














The current OneHotEncoder in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder first to encode categories to numbers (or different numbers to unique sorted numbers as you did).



This error a bug in OneHotEncoder as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto' to the code will remove this error like this:



onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 
handle_unknown='ignore')


If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.



See this issue here:




  • https://github.com/scikit-learn/scikit-learn/issues/12881






share|improve this answer





















  • Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
    – user3755632
    Dec 28 '18 at 12:44










  • @user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
    – Vivek Kumar
    Dec 28 '18 at 13:15














1












1








1






The current OneHotEncoder in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder first to encode categories to numbers (or different numbers to unique sorted numbers as you did).



This error a bug in OneHotEncoder as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto' to the code will remove this error like this:



onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 
handle_unknown='ignore')


If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.



See this issue here:




  • https://github.com/scikit-learn/scikit-learn/issues/12881






share|improve this answer












The current OneHotEncoder in scikit-learn (> 0.20) can handle strings or other categorical features itself not requiring to use the LabelEncoder first to encode categories to numbers (or different numbers to unique sorted numbers as you did).



This error a bug in OneHotEncoder as its been evolving to handle the above case, and in the meanwhile should also support the older use-cases as your question. Adding n_values='auto' to the code will remove this error like this:



onehotencoder = OneHotEncoder(categorical_features=[4], n_values='auto', 
handle_unknown='ignore')


If you remove the handle_unknown parameter from your code, then also this works, but that should not be done.



See this issue here:




  • https://github.com/scikit-learn/scikit-learn/issues/12881







share|improve this answer












share|improve this answer



share|improve this answer










answered Dec 28 '18 at 11:28









Vivek Kumar

15.4k41952




15.4k41952












  • Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
    – user3755632
    Dec 28 '18 at 12:44










  • @user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
    – Vivek Kumar
    Dec 28 '18 at 13:15


















  • Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
    – user3755632
    Dec 28 '18 at 12:44










  • @user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
    – Vivek Kumar
    Dec 28 '18 at 13:15
















Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44




Great, this gets rid of my error :) However, I noticed that the encoded columns are appended to the beginning of the feature vector, when I want to keep the order. Is this possible?
– user3755632
Dec 28 '18 at 12:44












@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
– Vivek Kumar
Dec 28 '18 at 13:15




@user3755632 No. Its mentioned in the documentation that the encoded columns will be in beginning and other columns will be appended to right. If you want to take control, then look at FeatureUnion.
– Vivek Kumar
Dec 28 '18 at 13:15


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53951218%2fpython-sklearn-onehotencoder%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas