How to handle missing values in the dataset

Multi tool use
I have a simple classification problem, which I am trying to address through neural network using keras. There is numeric dataset, of size 26000 * 17.But the problem is that, there are a lot of missing values (null values) in the dataset. Data is quite sensitive, so neither I can ignore all rows containing null values nor replace the null values in the data with average, mean or any standard number. There is also constraint of not using KNN imputation to replace missing entries.
What is the best way to handle such dataset?
dataframe machine-learning data-science
New contributor
Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
I have a simple classification problem, which I am trying to address through neural network using keras. There is numeric dataset, of size 26000 * 17.But the problem is that, there are a lot of missing values (null values) in the dataset. Data is quite sensitive, so neither I can ignore all rows containing null values nor replace the null values in the data with average, mean or any standard number. There is also constraint of not using KNN imputation to replace missing entries.
What is the best way to handle such dataset?
dataframe machine-learning data-science
New contributor
Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
– Erwin Smout
Dec 27 at 14:04
What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
– Chris
Dec 27 at 15:33
add a comment |
I have a simple classification problem, which I am trying to address through neural network using keras. There is numeric dataset, of size 26000 * 17.But the problem is that, there are a lot of missing values (null values) in the dataset. Data is quite sensitive, so neither I can ignore all rows containing null values nor replace the null values in the data with average, mean or any standard number. There is also constraint of not using KNN imputation to replace missing entries.
What is the best way to handle such dataset?
dataframe machine-learning data-science
New contributor
Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have a simple classification problem, which I am trying to address through neural network using keras. There is numeric dataset, of size 26000 * 17.But the problem is that, there are a lot of missing values (null values) in the dataset. Data is quite sensitive, so neither I can ignore all rows containing null values nor replace the null values in the data with average, mean or any standard number. There is also constraint of not using KNN imputation to replace missing entries.
What is the best way to handle such dataset?
dataframe machine-learning data-science
dataframe machine-learning data-science
New contributor
Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked Dec 27 at 13:39
Hassan
273
273
New contributor
Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
– Erwin Smout
Dec 27 at 14:04
What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
– Chris
Dec 27 at 15:33
add a comment |
If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
– Erwin Smout
Dec 27 at 14:04
What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
– Chris
Dec 27 at 15:33
If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
– Erwin Smout
Dec 27 at 14:04
If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
– Erwin Smout
Dec 27 at 14:04
What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
– Chris
Dec 27 at 15:33
What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
– Chris
Dec 27 at 15:33
add a comment |
2 Answers
2
active
oldest
votes
I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.
add a comment |
The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.
OP mentions explicitly that there is a constraint not to use k-nn imputation
– desertnaut
Dec 27 at 14:31
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Hassan is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946028%2fhow-to-handle-missing-values-in-the-dataset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.
add a comment |
I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.
add a comment |
I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.
I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.
answered Dec 27 at 13:47
user8611018
307
307
add a comment |
add a comment |
The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.
OP mentions explicitly that there is a constraint not to use k-nn imputation
– desertnaut
Dec 27 at 14:31
add a comment |
The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.
OP mentions explicitly that there is a constraint not to use k-nn imputation
– desertnaut
Dec 27 at 14:31
add a comment |
The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.
The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.
answered Dec 27 at 13:56
Muhammad Aqeel
417
417
OP mentions explicitly that there is a constraint not to use k-nn imputation
– desertnaut
Dec 27 at 14:31
add a comment |
OP mentions explicitly that there is a constraint not to use k-nn imputation
– desertnaut
Dec 27 at 14:31
OP mentions explicitly that there is a constraint not to use k-nn imputation
– desertnaut
Dec 27 at 14:31
OP mentions explicitly that there is a constraint not to use k-nn imputation
– desertnaut
Dec 27 at 14:31
add a comment |
Hassan is a new contributor. Be nice, and check out our Code of Conduct.
Hassan is a new contributor. Be nice, and check out our Code of Conduct.
Hassan is a new contributor. Be nice, and check out our Code of Conduct.
Hassan is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946028%2fhow-to-handle-missing-values-in-the-dataset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
90cz,FwwF9,i,V BUEVsixTG9YXh,dOHix,sV vZ2YJ2eGu QuJ9YTlGC4dGqv,DBMf 3ODlNN,SDg To,6tI3ZRPbcF7I54gVs,nr0H
If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
– Erwin Smout
Dec 27 at 14:04
What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
– Chris
Dec 27 at 15:33