How to handle missing values in the dataset

Multi tool use
Multi tool use












2














I have a simple classification problem, which I am trying to address through neural network using keras. There is numeric dataset, of size 26000 * 17.But the problem is that, there are a lot of missing values (null values) in the dataset. Data is quite sensitive, so neither I can ignore all rows containing null values nor replace the null values in the data with average, mean or any standard number. There is also constraint of not using KNN imputation to replace missing entries.
What is the best way to handle such dataset?










share|improve this question







New contributor




Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
    – Erwin Smout
    Dec 27 at 14:04










  • What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
    – Chris
    Dec 27 at 15:33
















2














I have a simple classification problem, which I am trying to address through neural network using keras. There is numeric dataset, of size 26000 * 17.But the problem is that, there are a lot of missing values (null values) in the dataset. Data is quite sensitive, so neither I can ignore all rows containing null values nor replace the null values in the data with average, mean or any standard number. There is also constraint of not using KNN imputation to replace missing entries.
What is the best way to handle such dataset?










share|improve this question







New contributor




Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
    – Erwin Smout
    Dec 27 at 14:04










  • What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
    – Chris
    Dec 27 at 15:33














2












2








2







I have a simple classification problem, which I am trying to address through neural network using keras. There is numeric dataset, of size 26000 * 17.But the problem is that, there are a lot of missing values (null values) in the dataset. Data is quite sensitive, so neither I can ignore all rows containing null values nor replace the null values in the data with average, mean or any standard number. There is also constraint of not using KNN imputation to replace missing entries.
What is the best way to handle such dataset?










share|improve this question







New contributor




Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have a simple classification problem, which I am trying to address through neural network using keras. There is numeric dataset, of size 26000 * 17.But the problem is that, there are a lot of missing values (null values) in the dataset. Data is quite sensitive, so neither I can ignore all rows containing null values nor replace the null values in the data with average, mean or any standard number. There is also constraint of not using KNN imputation to replace missing entries.
What is the best way to handle such dataset?







dataframe machine-learning data-science






share|improve this question







New contributor




Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Dec 27 at 13:39









Hassan

273




273




New contributor




Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Hassan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
    – Erwin Smout
    Dec 27 at 14:04










  • What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
    – Chris
    Dec 27 at 15:33


















  • If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
    – Erwin Smout
    Dec 27 at 14:04










  • What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
    – Chris
    Dec 27 at 15:33
















If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
– Erwin Smout
Dec 27 at 14:04




If they're giving you a null then they're not giving you data. If you need real data to do anything useful then you can't treat the nulls, period. You can't "classify" what you don't know and the nulls are things you don't know, period.
– Erwin Smout
Dec 27 at 14:04












What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
– Chris
Dec 27 at 15:33




What percentage of the data is missing? You could try upsampling, perhaps with a small amount of noise added to the duplicated samples. However, none of this is likely to be a fruitful exercise if there are a lot of missing values.
– Chris
Dec 27 at 15:33












2 Answers
2






active

oldest

votes


















1














I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.






share|improve this answer





























    0














    The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.






    share|improve this answer





















    • OP mentions explicitly that there is a constraint not to use k-nn imputation
      – desertnaut
      Dec 27 at 14:31











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    Hassan is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946028%2fhow-to-handle-missing-values-in-the-dataset%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.






    share|improve this answer


























      1














      I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.






      share|improve this answer
























        1












        1








        1






        I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.






        share|improve this answer












        I dont know how much your data is crucial. BTW there is no as such good way to handle missing values. Sure, you will have to handle it by finding mean or average or with any standard number(e.g 0). KNN imputation is considered best method but dont know why there is constraint of not using KNN imputation.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 27 at 13:47









        user8611018

        307




        307

























            0














            The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.






            share|improve this answer





















            • OP mentions explicitly that there is a constraint not to use k-nn imputation
              – desertnaut
              Dec 27 at 14:31
















            0














            The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.






            share|improve this answer





















            • OP mentions explicitly that there is a constraint not to use k-nn imputation
              – desertnaut
              Dec 27 at 14:31














            0












            0








            0






            The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.






            share|improve this answer












            The best way to replace the missing values in any sort of numeric dataset is KNN-Imputation, which replace the missing values by considering neighbor entries.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 27 at 13:56









            Muhammad Aqeel

            417




            417












            • OP mentions explicitly that there is a constraint not to use k-nn imputation
              – desertnaut
              Dec 27 at 14:31


















            • OP mentions explicitly that there is a constraint not to use k-nn imputation
              – desertnaut
              Dec 27 at 14:31
















            OP mentions explicitly that there is a constraint not to use k-nn imputation
            – desertnaut
            Dec 27 at 14:31




            OP mentions explicitly that there is a constraint not to use k-nn imputation
            – desertnaut
            Dec 27 at 14:31










            Hassan is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            Hassan is a new contributor. Be nice, and check out our Code of Conduct.













            Hassan is a new contributor. Be nice, and check out our Code of Conduct.












            Hassan is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946028%2fhow-to-handle-missing-values-in-the-dataset%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            90cz,FwwF9,i,V BUEVsixTG9YXh,dOHix,sV vZ2YJ2eGu QuJ9YTlGC4dGqv,DBMf 3ODlNN,SDg To,6tI3ZRPbcF7I54gVs,nr0H
            AXzOUQwyUPH

            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas