Automating interpolation of missing values in pandas dataframe





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I have a dataframe with airline booking data for the past year for a particular origin and destination. There are hundreds of similar data-sets in the system.



In each data-set, there are holes in data. In the current example, we have about 85 days of year for which we don't have booking data.



There are two columns here - departure_date and bookings.



The next step for me would be to include the missing dates in the date column, and set the corresponding values in bookings column to NaN.



I am looking for the best way to do this.



Please find a part of the dataFrame below:



Index       departure_date              bookings
0 2017-11-02 00:00:00 43
1 2017-11-03 00:00:00 27
2 2017-11-05 00:00:00 27 ********
3 2017-11-06 00:00:00 22
4 2017-11-07 00:00:00 39
.
.
164 2018-05-22 00:00:00 17
165 2018-05-23 00:00:00 41
166 2018-05-24 00:00:00 73
167 2018-07-02 00:00:00 4 *********
168 2018-07-03 00:00:00 31
.
.
277 2018-10-31 00:00:00 50
278 2018-11-01 00:00:00 60


We can see that the data-set is for a one year period (Nov 2, 2017 to Nov 1, 2018). But we have data for 279 days only. For example, we don't have any data between 2018-05-25 and 2018-07-01. I would have to include these dates in the departure_date column and set the corresponding booking values to NaN.



For the second step, I plan to do some interpolation using something like



dataFrame['bookings'].interpolate(method='time', inplace=True)


Please suggest if there are better alternatives in Python.










share|improve this question

























  • I doubt an interpolation would be accurate...

    – BlackBear
    Jan 4 at 10:39











  • That's true.. it's only for some testing purpose. For now, I need to know how to prepare the dataFrame by including missing dates and NaN values in bookings column. There seems to be many methods to estimate missing time series data.

    – rahul
    Jan 4 at 10:40




















0















I have a dataframe with airline booking data for the past year for a particular origin and destination. There are hundreds of similar data-sets in the system.



In each data-set, there are holes in data. In the current example, we have about 85 days of year for which we don't have booking data.



There are two columns here - departure_date and bookings.



The next step for me would be to include the missing dates in the date column, and set the corresponding values in bookings column to NaN.



I am looking for the best way to do this.



Please find a part of the dataFrame below:



Index       departure_date              bookings
0 2017-11-02 00:00:00 43
1 2017-11-03 00:00:00 27
2 2017-11-05 00:00:00 27 ********
3 2017-11-06 00:00:00 22
4 2017-11-07 00:00:00 39
.
.
164 2018-05-22 00:00:00 17
165 2018-05-23 00:00:00 41
166 2018-05-24 00:00:00 73
167 2018-07-02 00:00:00 4 *********
168 2018-07-03 00:00:00 31
.
.
277 2018-10-31 00:00:00 50
278 2018-11-01 00:00:00 60


We can see that the data-set is for a one year period (Nov 2, 2017 to Nov 1, 2018). But we have data for 279 days only. For example, we don't have any data between 2018-05-25 and 2018-07-01. I would have to include these dates in the departure_date column and set the corresponding booking values to NaN.



For the second step, I plan to do some interpolation using something like



dataFrame['bookings'].interpolate(method='time', inplace=True)


Please suggest if there are better alternatives in Python.










share|improve this question

























  • I doubt an interpolation would be accurate...

    – BlackBear
    Jan 4 at 10:39











  • That's true.. it's only for some testing purpose. For now, I need to know how to prepare the dataFrame by including missing dates and NaN values in bookings column. There seems to be many methods to estimate missing time series data.

    – rahul
    Jan 4 at 10:40
















0












0








0








I have a dataframe with airline booking data for the past year for a particular origin and destination. There are hundreds of similar data-sets in the system.



In each data-set, there are holes in data. In the current example, we have about 85 days of year for which we don't have booking data.



There are two columns here - departure_date and bookings.



The next step for me would be to include the missing dates in the date column, and set the corresponding values in bookings column to NaN.



I am looking for the best way to do this.



Please find a part of the dataFrame below:



Index       departure_date              bookings
0 2017-11-02 00:00:00 43
1 2017-11-03 00:00:00 27
2 2017-11-05 00:00:00 27 ********
3 2017-11-06 00:00:00 22
4 2017-11-07 00:00:00 39
.
.
164 2018-05-22 00:00:00 17
165 2018-05-23 00:00:00 41
166 2018-05-24 00:00:00 73
167 2018-07-02 00:00:00 4 *********
168 2018-07-03 00:00:00 31
.
.
277 2018-10-31 00:00:00 50
278 2018-11-01 00:00:00 60


We can see that the data-set is for a one year period (Nov 2, 2017 to Nov 1, 2018). But we have data for 279 days only. For example, we don't have any data between 2018-05-25 and 2018-07-01. I would have to include these dates in the departure_date column and set the corresponding booking values to NaN.



For the second step, I plan to do some interpolation using something like



dataFrame['bookings'].interpolate(method='time', inplace=True)


Please suggest if there are better alternatives in Python.










share|improve this question
















I have a dataframe with airline booking data for the past year for a particular origin and destination. There are hundreds of similar data-sets in the system.



In each data-set, there are holes in data. In the current example, we have about 85 days of year for which we don't have booking data.



There are two columns here - departure_date and bookings.



The next step for me would be to include the missing dates in the date column, and set the corresponding values in bookings column to NaN.



I am looking for the best way to do this.



Please find a part of the dataFrame below:



Index       departure_date              bookings
0 2017-11-02 00:00:00 43
1 2017-11-03 00:00:00 27
2 2017-11-05 00:00:00 27 ********
3 2017-11-06 00:00:00 22
4 2017-11-07 00:00:00 39
.
.
164 2018-05-22 00:00:00 17
165 2018-05-23 00:00:00 41
166 2018-05-24 00:00:00 73
167 2018-07-02 00:00:00 4 *********
168 2018-07-03 00:00:00 31
.
.
277 2018-10-31 00:00:00 50
278 2018-11-01 00:00:00 60


We can see that the data-set is for a one year period (Nov 2, 2017 to Nov 1, 2018). But we have data for 279 days only. For example, we don't have any data between 2018-05-25 and 2018-07-01. I would have to include these dates in the departure_date column and set the corresponding booking values to NaN.



For the second step, I plan to do some interpolation using something like



dataFrame['bookings'].interpolate(method='time', inplace=True)


Please suggest if there are better alternatives in Python.







python pandas time-series interpolation missing-data






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 4 at 10:44









Georgy

2,34241729




2,34241729










asked Jan 4 at 10:22









rahulrahul

767




767













  • I doubt an interpolation would be accurate...

    – BlackBear
    Jan 4 at 10:39











  • That's true.. it's only for some testing purpose. For now, I need to know how to prepare the dataFrame by including missing dates and NaN values in bookings column. There seems to be many methods to estimate missing time series data.

    – rahul
    Jan 4 at 10:40





















  • I doubt an interpolation would be accurate...

    – BlackBear
    Jan 4 at 10:39











  • That's true.. it's only for some testing purpose. For now, I need to know how to prepare the dataFrame by including missing dates and NaN values in bookings column. There seems to be many methods to estimate missing time series data.

    – rahul
    Jan 4 at 10:40



















I doubt an interpolation would be accurate...

– BlackBear
Jan 4 at 10:39





I doubt an interpolation would be accurate...

– BlackBear
Jan 4 at 10:39













That's true.. it's only for some testing purpose. For now, I need to know how to prepare the dataFrame by including missing dates and NaN values in bookings column. There seems to be many methods to estimate missing time series data.

– rahul
Jan 4 at 10:40







That's true.. it's only for some testing purpose. For now, I need to know how to prepare the dataFrame by including missing dates and NaN values in bookings column. There seems to be many methods to estimate missing time series data.

– rahul
Jan 4 at 10:40














1 Answer
1






active

oldest

votes


















1














This resample for each day. Then fill the gaps.



dataFrame['bookings'].resample('D').pad()


You can have more resampler idea on this page (so you can select the one that fit the best with your needs):
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html






share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54037000%2fautomating-interpolation-of-missing-values-in-pandas-dataframe%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    This resample for each day. Then fill the gaps.



    dataFrame['bookings'].resample('D').pad()


    You can have more resampler idea on this page (so you can select the one that fit the best with your needs):
    https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html






    share|improve this answer




























      1














      This resample for each day. Then fill the gaps.



      dataFrame['bookings'].resample('D').pad()


      You can have more resampler idea on this page (so you can select the one that fit the best with your needs):
      https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html






      share|improve this answer


























        1












        1








        1







        This resample for each day. Then fill the gaps.



        dataFrame['bookings'].resample('D').pad()


        You can have more resampler idea on this page (so you can select the one that fit the best with your needs):
        https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html






        share|improve this answer













        This resample for each day. Then fill the gaps.



        dataFrame['bookings'].resample('D').pad()


        You can have more resampler idea on this page (so you can select the one that fit the best with your needs):
        https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 4 at 14:24









        Ludo SchmidtLudo Schmidt

        1438




        1438
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54037000%2fautomating-interpolation-of-missing-values-in-pandas-dataframe%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas