What should be done to handle Imbalanced classes in case of Multi-class classification





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I have a dataset which consist of user tickets which is random in pattern and some 56 cols in it and it is a text data. My task is to create a model and train it to identify and predict to which category the tickets belongs to, and we have 100 + category being there. The count for suppose category A is 70,000 other is 50,0000 and for some category the ticket count goes down to 1 is this an imbalanced data? If is is how should i handle this for multi class classification and for till now to handle this data which i think is is imbalanced i am using SMOTE but the accuracy decreases. What should i do in this case?



I have already tried DecisionTree classifier and now working on Logisitic regression.










share|improve this question





























    0















    I have a dataset which consist of user tickets which is random in pattern and some 56 cols in it and it is a text data. My task is to create a model and train it to identify and predict to which category the tickets belongs to, and we have 100 + category being there. The count for suppose category A is 70,000 other is 50,0000 and for some category the ticket count goes down to 1 is this an imbalanced data? If is is how should i handle this for multi class classification and for till now to handle this data which i think is is imbalanced i am using SMOTE but the accuracy decreases. What should i do in this case?



    I have already tried DecisionTree classifier and now working on Logisitic regression.










    share|improve this question

























      0












      0








      0








      I have a dataset which consist of user tickets which is random in pattern and some 56 cols in it and it is a text data. My task is to create a model and train it to identify and predict to which category the tickets belongs to, and we have 100 + category being there. The count for suppose category A is 70,000 other is 50,0000 and for some category the ticket count goes down to 1 is this an imbalanced data? If is is how should i handle this for multi class classification and for till now to handle this data which i think is is imbalanced i am using SMOTE but the accuracy decreases. What should i do in this case?



      I have already tried DecisionTree classifier and now working on Logisitic regression.










      share|improve this question














      I have a dataset which consist of user tickets which is random in pattern and some 56 cols in it and it is a text data. My task is to create a model and train it to identify and predict to which category the tickets belongs to, and we have 100 + category being there. The count for suppose category A is 70,000 other is 50,0000 and for some category the ticket count goes down to 1 is this an imbalanced data? If is is how should i handle this for multi class classification and for till now to handle this data which i think is is imbalanced i am using SMOTE but the accuracy decreases. What should i do in this case?



      I have already tried DecisionTree classifier and now working on Logisitic regression.







      machine-learning data-analysis






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 4 at 12:00









      pratha1995pratha1995

      5510




      5510
























          1 Answer
          1






          active

          oldest

          votes


















          0














          1) Use F1-score as the evaluation metric in such cases(highly imbalanced data).



          2) Use stratified sampling while train_test split.



          3) Try one vs rest classifier.



          4) Use algorithms like xgboost, lightgbm and catboost.






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54038590%2fwhat-should-be-done-to-handle-imbalanced-classes-in-case-of-multi-class-classifi%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            1) Use F1-score as the evaluation metric in such cases(highly imbalanced data).



            2) Use stratified sampling while train_test split.



            3) Try one vs rest classifier.



            4) Use algorithms like xgboost, lightgbm and catboost.






            share|improve this answer




























              0














              1) Use F1-score as the evaluation metric in such cases(highly imbalanced data).



              2) Use stratified sampling while train_test split.



              3) Try one vs rest classifier.



              4) Use algorithms like xgboost, lightgbm and catboost.






              share|improve this answer


























                0












                0








                0







                1) Use F1-score as the evaluation metric in such cases(highly imbalanced data).



                2) Use stratified sampling while train_test split.



                3) Try one vs rest classifier.



                4) Use algorithms like xgboost, lightgbm and catboost.






                share|improve this answer













                1) Use F1-score as the evaluation metric in such cases(highly imbalanced data).



                2) Use stratified sampling while train_test split.



                3) Try one vs rest classifier.



                4) Use algorithms like xgboost, lightgbm and catboost.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jan 4 at 12:18









                RavikiranRavikiran

                1217




                1217
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54038590%2fwhat-should-be-done-to-handle-imbalanced-classes-in-case-of-multi-class-classifi%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Mossoró

                    Error while reading .h5 file using the rhdf5 package in R

                    Pushsharp Apns notification error: 'InvalidToken'