merge two dataframes based on closest datetime












2















I have two data sets, one containing air quality data and one containing weather data, each with a column named 'dt' for date and time. However these times do not match exactly. I would like to join these tables so that the air quality data is retained and the closest time on the weather data is matched and merged.



df_aq:



                  dt   Latitude  Longitude  ...   Speed_kmh  PM2.5  PM10
0 11/20/2018 12:16 33.213922 -97.151055 ... 0.35 16.0 86.1
1 11/20/2018 12:16 33.213928 -97.151007 ... 5.01 16.0 86.1
2 11/20/2018 12:16 33.213907 -97.150953 ... 5.27 16.0 86.1
3 11/20/2018 12:16 33.213872 -97.150883 ... 5.03 16.0 86.1
...
364 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
365 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
366 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3]


df_weather:



     USAF  WBAN               dt  DIR SPD ... PCP01  PCP06  PCP24  PCPXX
0 722589 3991 11/20/2018 0:53 360 6 ... 0 ***** ***** *****
1 722589 3991 11/20/2018 1:53 350 6 ... 0 ***** ***** *****
2 722589 3991 11/20/2018 2:53 310 3 ... 0 ***** ***** *****
3 722589 3991 11/20/2018 3:53 330 5 ... 0 ***** ***** *****
4 722589 3991 11/20/2018 4:53 310 6 ... 0 ***** ***** *****


df_aq ranges from 12:16-12:46, and df_weather has data every hour on the 53 minute mark. Therefore the closest times would be 11:53 and 12:53, so I would like those two times and the subsequent weather data to merge appropriately with all the data on df_aq



I've tried experimenting with iloc and Index.get_loc as that seems to be the best way, but I keep getting an error.



I've tried:



ctr = df_aq['dt'].count() - 1 
startTime = df_aq['dt'][0]
endTime = df_aq['dt'][ctr]
print df_weather.iloc[df_weather.index.get_loc(startTime,method='nearest') or df_weather.index.get_loc(endTime,method='nearest')]


but then I get an error:



TypeError: unsupported operand type(s) for -: 'long' and 'str'


I'm not sure what this error means



Is there a better way to do this than iloc? And if not, what am I doing wrong with this bit of code?



Thank you very much for any help you can offer.










share|improve this question


















  • 1





    check pd.merge_asof docs

    – anky_91
    Dec 31 '18 at 6:01











  • @Shannon please provide the link of your datasets.

    – Abdur Rehman
    Dec 31 '18 at 6:31
















2















I have two data sets, one containing air quality data and one containing weather data, each with a column named 'dt' for date and time. However these times do not match exactly. I would like to join these tables so that the air quality data is retained and the closest time on the weather data is matched and merged.



df_aq:



                  dt   Latitude  Longitude  ...   Speed_kmh  PM2.5  PM10
0 11/20/2018 12:16 33.213922 -97.151055 ... 0.35 16.0 86.1
1 11/20/2018 12:16 33.213928 -97.151007 ... 5.01 16.0 86.1
2 11/20/2018 12:16 33.213907 -97.150953 ... 5.27 16.0 86.1
3 11/20/2018 12:16 33.213872 -97.150883 ... 5.03 16.0 86.1
...
364 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
365 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
366 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3]


df_weather:



     USAF  WBAN               dt  DIR SPD ... PCP01  PCP06  PCP24  PCPXX
0 722589 3991 11/20/2018 0:53 360 6 ... 0 ***** ***** *****
1 722589 3991 11/20/2018 1:53 350 6 ... 0 ***** ***** *****
2 722589 3991 11/20/2018 2:53 310 3 ... 0 ***** ***** *****
3 722589 3991 11/20/2018 3:53 330 5 ... 0 ***** ***** *****
4 722589 3991 11/20/2018 4:53 310 6 ... 0 ***** ***** *****


df_aq ranges from 12:16-12:46, and df_weather has data every hour on the 53 minute mark. Therefore the closest times would be 11:53 and 12:53, so I would like those two times and the subsequent weather data to merge appropriately with all the data on df_aq



I've tried experimenting with iloc and Index.get_loc as that seems to be the best way, but I keep getting an error.



I've tried:



ctr = df_aq['dt'].count() - 1 
startTime = df_aq['dt'][0]
endTime = df_aq['dt'][ctr]
print df_weather.iloc[df_weather.index.get_loc(startTime,method='nearest') or df_weather.index.get_loc(endTime,method='nearest')]


but then I get an error:



TypeError: unsupported operand type(s) for -: 'long' and 'str'


I'm not sure what this error means



Is there a better way to do this than iloc? And if not, what am I doing wrong with this bit of code?



Thank you very much for any help you can offer.










share|improve this question


















  • 1





    check pd.merge_asof docs

    – anky_91
    Dec 31 '18 at 6:01











  • @Shannon please provide the link of your datasets.

    – Abdur Rehman
    Dec 31 '18 at 6:31














2












2








2


1






I have two data sets, one containing air quality data and one containing weather data, each with a column named 'dt' for date and time. However these times do not match exactly. I would like to join these tables so that the air quality data is retained and the closest time on the weather data is matched and merged.



df_aq:



                  dt   Latitude  Longitude  ...   Speed_kmh  PM2.5  PM10
0 11/20/2018 12:16 33.213922 -97.151055 ... 0.35 16.0 86.1
1 11/20/2018 12:16 33.213928 -97.151007 ... 5.01 16.0 86.1
2 11/20/2018 12:16 33.213907 -97.150953 ... 5.27 16.0 86.1
3 11/20/2018 12:16 33.213872 -97.150883 ... 5.03 16.0 86.1
...
364 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
365 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
366 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3]


df_weather:



     USAF  WBAN               dt  DIR SPD ... PCP01  PCP06  PCP24  PCPXX
0 722589 3991 11/20/2018 0:53 360 6 ... 0 ***** ***** *****
1 722589 3991 11/20/2018 1:53 350 6 ... 0 ***** ***** *****
2 722589 3991 11/20/2018 2:53 310 3 ... 0 ***** ***** *****
3 722589 3991 11/20/2018 3:53 330 5 ... 0 ***** ***** *****
4 722589 3991 11/20/2018 4:53 310 6 ... 0 ***** ***** *****


df_aq ranges from 12:16-12:46, and df_weather has data every hour on the 53 minute mark. Therefore the closest times would be 11:53 and 12:53, so I would like those two times and the subsequent weather data to merge appropriately with all the data on df_aq



I've tried experimenting with iloc and Index.get_loc as that seems to be the best way, but I keep getting an error.



I've tried:



ctr = df_aq['dt'].count() - 1 
startTime = df_aq['dt'][0]
endTime = df_aq['dt'][ctr]
print df_weather.iloc[df_weather.index.get_loc(startTime,method='nearest') or df_weather.index.get_loc(endTime,method='nearest')]


but then I get an error:



TypeError: unsupported operand type(s) for -: 'long' and 'str'


I'm not sure what this error means



Is there a better way to do this than iloc? And if not, what am I doing wrong with this bit of code?



Thank you very much for any help you can offer.










share|improve this question














I have two data sets, one containing air quality data and one containing weather data, each with a column named 'dt' for date and time. However these times do not match exactly. I would like to join these tables so that the air quality data is retained and the closest time on the weather data is matched and merged.



df_aq:



                  dt   Latitude  Longitude  ...   Speed_kmh  PM2.5  PM10
0 11/20/2018 12:16 33.213922 -97.151055 ... 0.35 16.0 86.1
1 11/20/2018 12:16 33.213928 -97.151007 ... 5.01 16.0 86.1
2 11/20/2018 12:16 33.213907 -97.150953 ... 5.27 16.0 86.1
3 11/20/2018 12:16 33.213872 -97.150883 ... 5.03 16.0 86.1
...
364 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
365 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
366 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3]


df_weather:



     USAF  WBAN               dt  DIR SPD ... PCP01  PCP06  PCP24  PCPXX
0 722589 3991 11/20/2018 0:53 360 6 ... 0 ***** ***** *****
1 722589 3991 11/20/2018 1:53 350 6 ... 0 ***** ***** *****
2 722589 3991 11/20/2018 2:53 310 3 ... 0 ***** ***** *****
3 722589 3991 11/20/2018 3:53 330 5 ... 0 ***** ***** *****
4 722589 3991 11/20/2018 4:53 310 6 ... 0 ***** ***** *****


df_aq ranges from 12:16-12:46, and df_weather has data every hour on the 53 minute mark. Therefore the closest times would be 11:53 and 12:53, so I would like those two times and the subsequent weather data to merge appropriately with all the data on df_aq



I've tried experimenting with iloc and Index.get_loc as that seems to be the best way, but I keep getting an error.



I've tried:



ctr = df_aq['dt'].count() - 1 
startTime = df_aq['dt'][0]
endTime = df_aq['dt'][ctr]
print df_weather.iloc[df_weather.index.get_loc(startTime,method='nearest') or df_weather.index.get_loc(endTime,method='nearest')]


but then I get an error:



TypeError: unsupported operand type(s) for -: 'long' and 'str'


I'm not sure what this error means



Is there a better way to do this than iloc? And if not, what am I doing wrong with this bit of code?



Thank you very much for any help you can offer.







python pandas datetime join merge






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 31 '18 at 4:01









ShannonShannon

132




132








  • 1





    check pd.merge_asof docs

    – anky_91
    Dec 31 '18 at 6:01











  • @Shannon please provide the link of your datasets.

    – Abdur Rehman
    Dec 31 '18 at 6:31














  • 1





    check pd.merge_asof docs

    – anky_91
    Dec 31 '18 at 6:01











  • @Shannon please provide the link of your datasets.

    – Abdur Rehman
    Dec 31 '18 at 6:31








1




1





check pd.merge_asof docs

– anky_91
Dec 31 '18 at 6:01





check pd.merge_asof docs

– anky_91
Dec 31 '18 at 6:01













@Shannon please provide the link of your datasets.

– Abdur Rehman
Dec 31 '18 at 6:31





@Shannon please provide the link of your datasets.

– Abdur Rehman
Dec 31 '18 at 6:31












1 Answer
1






active

oldest

votes


















0














I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.



As stated in the comment section you can try special function merge_asof() for merging Time-series DataFrames



DataFrame First:



>>> df1
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 GOOG 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100


DataFrame Second:



>>> df2
time ticker bid ask
0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
7 2016-05-25 13:30:00.075 MSFT 52.01 52.03


>>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')

>>> new_df
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN


Check the Documentation Doc merge_asof






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53983467%2fmerge-two-dataframes-based-on-closest-datetime%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.



    As stated in the comment section you can try special function merge_asof() for merging Time-series DataFrames



    DataFrame First:



    >>> df1
    time ticker price quantity
    0 2016-05-25 13:30:00.023 MSFT 51.95 75
    1 2016-05-25 13:30:00.038 MSFT 51.95 155
    2 2016-05-25 13:30:00.048 GOOG 720.77 100
    3 2016-05-25 13:30:00.048 GOOG 720.92 100
    4 2016-05-25 13:30:00.048 AAPL 98.00 100


    DataFrame Second:



    >>> df2
    time ticker bid ask
    0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
    1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
    2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
    3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
    4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
    5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
    6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
    7 2016-05-25 13:30:00.075 MSFT 52.01 52.03


    >>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')

    >>> new_df
    time ticker price quantity bid ask
    0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
    1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
    2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
    3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
    4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN


    Check the Documentation Doc merge_asof






    share|improve this answer




























      0














      I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.



      As stated in the comment section you can try special function merge_asof() for merging Time-series DataFrames



      DataFrame First:



      >>> df1
      time ticker price quantity
      0 2016-05-25 13:30:00.023 MSFT 51.95 75
      1 2016-05-25 13:30:00.038 MSFT 51.95 155
      2 2016-05-25 13:30:00.048 GOOG 720.77 100
      3 2016-05-25 13:30:00.048 GOOG 720.92 100
      4 2016-05-25 13:30:00.048 AAPL 98.00 100


      DataFrame Second:



      >>> df2
      time ticker bid ask
      0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
      1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
      2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
      3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
      4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
      5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
      6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
      7 2016-05-25 13:30:00.075 MSFT 52.01 52.03


      >>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')

      >>> new_df
      time ticker price quantity bid ask
      0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
      1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
      2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
      3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
      4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN


      Check the Documentation Doc merge_asof






      share|improve this answer


























        0












        0








        0







        I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.



        As stated in the comment section you can try special function merge_asof() for merging Time-series DataFrames



        DataFrame First:



        >>> df1
        time ticker price quantity
        0 2016-05-25 13:30:00.023 MSFT 51.95 75
        1 2016-05-25 13:30:00.038 MSFT 51.95 155
        2 2016-05-25 13:30:00.048 GOOG 720.77 100
        3 2016-05-25 13:30:00.048 GOOG 720.92 100
        4 2016-05-25 13:30:00.048 AAPL 98.00 100


        DataFrame Second:



        >>> df2
        time ticker bid ask
        0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
        1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
        2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
        3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
        4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
        5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
        6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
        7 2016-05-25 13:30:00.075 MSFT 52.01 52.03


        >>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')

        >>> new_df
        time ticker price quantity bid ask
        0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
        1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
        2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
        3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
        4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN


        Check the Documentation Doc merge_asof






        share|improve this answer













        I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.



        As stated in the comment section you can try special function merge_asof() for merging Time-series DataFrames



        DataFrame First:



        >>> df1
        time ticker price quantity
        0 2016-05-25 13:30:00.023 MSFT 51.95 75
        1 2016-05-25 13:30:00.038 MSFT 51.95 155
        2 2016-05-25 13:30:00.048 GOOG 720.77 100
        3 2016-05-25 13:30:00.048 GOOG 720.92 100
        4 2016-05-25 13:30:00.048 AAPL 98.00 100


        DataFrame Second:



        >>> df2
        time ticker bid ask
        0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
        1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
        2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
        3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
        4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
        5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
        6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
        7 2016-05-25 13:30:00.075 MSFT 52.01 52.03


        >>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')

        >>> new_df
        time ticker price quantity bid ask
        0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
        1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
        2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
        3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
        4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN


        Check the Documentation Doc merge_asof







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 31 '18 at 9:28









        pygopygo

        3,0551619




        3,0551619






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53983467%2fmerge-two-dataframes-based-on-closest-datetime%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas