merge two dataframes based on closest datetime
I have two data sets, one containing air quality data and one containing weather data, each with a column named 'dt' for date and time. However these times do not match exactly. I would like to join these tables so that the air quality data is retained and the closest time on the weather data is matched and merged.
df_aq:
dt Latitude Longitude ... Speed_kmh PM2.5 PM10
0 11/20/2018 12:16 33.213922 -97.151055 ... 0.35 16.0 86.1
1 11/20/2018 12:16 33.213928 -97.151007 ... 5.01 16.0 86.1
2 11/20/2018 12:16 33.213907 -97.150953 ... 5.27 16.0 86.1
3 11/20/2018 12:16 33.213872 -97.150883 ... 5.03 16.0 86.1
...
364 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
365 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
366 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3]
df_weather:
USAF WBAN dt DIR SPD ... PCP01 PCP06 PCP24 PCPXX
0 722589 3991 11/20/2018 0:53 360 6 ... 0 ***** ***** *****
1 722589 3991 11/20/2018 1:53 350 6 ... 0 ***** ***** *****
2 722589 3991 11/20/2018 2:53 310 3 ... 0 ***** ***** *****
3 722589 3991 11/20/2018 3:53 330 5 ... 0 ***** ***** *****
4 722589 3991 11/20/2018 4:53 310 6 ... 0 ***** ***** *****
df_aq ranges from 12:16-12:46, and df_weather has data every hour on the 53 minute mark. Therefore the closest times would be 11:53 and 12:53, so I would like those two times and the subsequent weather data to merge appropriately with all the data on df_aq
I've tried experimenting with iloc and Index.get_loc as that seems to be the best way, but I keep getting an error.
I've tried:
ctr = df_aq['dt'].count() - 1
startTime = df_aq['dt'][0]
endTime = df_aq['dt'][ctr]
print df_weather.iloc[df_weather.index.get_loc(startTime,method='nearest') or df_weather.index.get_loc(endTime,method='nearest')]
but then I get an error:
TypeError: unsupported operand type(s) for -: 'long' and 'str'
I'm not sure what this error means
Is there a better way to do this than iloc? And if not, what am I doing wrong with this bit of code?
Thank you very much for any help you can offer.
python pandas datetime join merge
add a comment |
I have two data sets, one containing air quality data and one containing weather data, each with a column named 'dt' for date and time. However these times do not match exactly. I would like to join these tables so that the air quality data is retained and the closest time on the weather data is matched and merged.
df_aq:
dt Latitude Longitude ... Speed_kmh PM2.5 PM10
0 11/20/2018 12:16 33.213922 -97.151055 ... 0.35 16.0 86.1
1 11/20/2018 12:16 33.213928 -97.151007 ... 5.01 16.0 86.1
2 11/20/2018 12:16 33.213907 -97.150953 ... 5.27 16.0 86.1
3 11/20/2018 12:16 33.213872 -97.150883 ... 5.03 16.0 86.1
...
364 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
365 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
366 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3]
df_weather:
USAF WBAN dt DIR SPD ... PCP01 PCP06 PCP24 PCPXX
0 722589 3991 11/20/2018 0:53 360 6 ... 0 ***** ***** *****
1 722589 3991 11/20/2018 1:53 350 6 ... 0 ***** ***** *****
2 722589 3991 11/20/2018 2:53 310 3 ... 0 ***** ***** *****
3 722589 3991 11/20/2018 3:53 330 5 ... 0 ***** ***** *****
4 722589 3991 11/20/2018 4:53 310 6 ... 0 ***** ***** *****
df_aq ranges from 12:16-12:46, and df_weather has data every hour on the 53 minute mark. Therefore the closest times would be 11:53 and 12:53, so I would like those two times and the subsequent weather data to merge appropriately with all the data on df_aq
I've tried experimenting with iloc and Index.get_loc as that seems to be the best way, but I keep getting an error.
I've tried:
ctr = df_aq['dt'].count() - 1
startTime = df_aq['dt'][0]
endTime = df_aq['dt'][ctr]
print df_weather.iloc[df_weather.index.get_loc(startTime,method='nearest') or df_weather.index.get_loc(endTime,method='nearest')]
but then I get an error:
TypeError: unsupported operand type(s) for -: 'long' and 'str'
I'm not sure what this error means
Is there a better way to do this than iloc? And if not, what am I doing wrong with this bit of code?
Thank you very much for any help you can offer.
python pandas datetime join merge
1
checkpd.merge_asof
docs
– anky_91
Dec 31 '18 at 6:01
@Shannon please provide the link of your datasets.
– Abdur Rehman
Dec 31 '18 at 6:31
add a comment |
I have two data sets, one containing air quality data and one containing weather data, each with a column named 'dt' for date and time. However these times do not match exactly. I would like to join these tables so that the air quality data is retained and the closest time on the weather data is matched and merged.
df_aq:
dt Latitude Longitude ... Speed_kmh PM2.5 PM10
0 11/20/2018 12:16 33.213922 -97.151055 ... 0.35 16.0 86.1
1 11/20/2018 12:16 33.213928 -97.151007 ... 5.01 16.0 86.1
2 11/20/2018 12:16 33.213907 -97.150953 ... 5.27 16.0 86.1
3 11/20/2018 12:16 33.213872 -97.150883 ... 5.03 16.0 86.1
...
364 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
365 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
366 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3]
df_weather:
USAF WBAN dt DIR SPD ... PCP01 PCP06 PCP24 PCPXX
0 722589 3991 11/20/2018 0:53 360 6 ... 0 ***** ***** *****
1 722589 3991 11/20/2018 1:53 350 6 ... 0 ***** ***** *****
2 722589 3991 11/20/2018 2:53 310 3 ... 0 ***** ***** *****
3 722589 3991 11/20/2018 3:53 330 5 ... 0 ***** ***** *****
4 722589 3991 11/20/2018 4:53 310 6 ... 0 ***** ***** *****
df_aq ranges from 12:16-12:46, and df_weather has data every hour on the 53 minute mark. Therefore the closest times would be 11:53 and 12:53, so I would like those two times and the subsequent weather data to merge appropriately with all the data on df_aq
I've tried experimenting with iloc and Index.get_loc as that seems to be the best way, but I keep getting an error.
I've tried:
ctr = df_aq['dt'].count() - 1
startTime = df_aq['dt'][0]
endTime = df_aq['dt'][ctr]
print df_weather.iloc[df_weather.index.get_loc(startTime,method='nearest') or df_weather.index.get_loc(endTime,method='nearest')]
but then I get an error:
TypeError: unsupported operand type(s) for -: 'long' and 'str'
I'm not sure what this error means
Is there a better way to do this than iloc? And if not, what am I doing wrong with this bit of code?
Thank you very much for any help you can offer.
python pandas datetime join merge
I have two data sets, one containing air quality data and one containing weather data, each with a column named 'dt' for date and time. However these times do not match exactly. I would like to join these tables so that the air quality data is retained and the closest time on the weather data is matched and merged.
df_aq:
dt Latitude Longitude ... Speed_kmh PM2.5 PM10
0 11/20/2018 12:16 33.213922 -97.151055 ... 0.35 16.0 86.1
1 11/20/2018 12:16 33.213928 -97.151007 ... 5.01 16.0 86.1
2 11/20/2018 12:16 33.213907 -97.150953 ... 5.27 16.0 86.1
3 11/20/2018 12:16 33.213872 -97.150883 ... 5.03 16.0 86.1
...
364 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
365 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3
366 11/20/2018 12:46 33.209462 -97.148623 ... 0.00 2.8 6.3]
df_weather:
USAF WBAN dt DIR SPD ... PCP01 PCP06 PCP24 PCPXX
0 722589 3991 11/20/2018 0:53 360 6 ... 0 ***** ***** *****
1 722589 3991 11/20/2018 1:53 350 6 ... 0 ***** ***** *****
2 722589 3991 11/20/2018 2:53 310 3 ... 0 ***** ***** *****
3 722589 3991 11/20/2018 3:53 330 5 ... 0 ***** ***** *****
4 722589 3991 11/20/2018 4:53 310 6 ... 0 ***** ***** *****
df_aq ranges from 12:16-12:46, and df_weather has data every hour on the 53 minute mark. Therefore the closest times would be 11:53 and 12:53, so I would like those two times and the subsequent weather data to merge appropriately with all the data on df_aq
I've tried experimenting with iloc and Index.get_loc as that seems to be the best way, but I keep getting an error.
I've tried:
ctr = df_aq['dt'].count() - 1
startTime = df_aq['dt'][0]
endTime = df_aq['dt'][ctr]
print df_weather.iloc[df_weather.index.get_loc(startTime,method='nearest') or df_weather.index.get_loc(endTime,method='nearest')]
but then I get an error:
TypeError: unsupported operand type(s) for -: 'long' and 'str'
I'm not sure what this error means
Is there a better way to do this than iloc? And if not, what am I doing wrong with this bit of code?
Thank you very much for any help you can offer.
python pandas datetime join merge
python pandas datetime join merge
asked Dec 31 '18 at 4:01
ShannonShannon
132
132
1
checkpd.merge_asof
docs
– anky_91
Dec 31 '18 at 6:01
@Shannon please provide the link of your datasets.
– Abdur Rehman
Dec 31 '18 at 6:31
add a comment |
1
checkpd.merge_asof
docs
– anky_91
Dec 31 '18 at 6:01
@Shannon please provide the link of your datasets.
– Abdur Rehman
Dec 31 '18 at 6:31
1
1
check
pd.merge_asof
docs– anky_91
Dec 31 '18 at 6:01
check
pd.merge_asof
docs– anky_91
Dec 31 '18 at 6:01
@Shannon please provide the link of your datasets.
– Abdur Rehman
Dec 31 '18 at 6:31
@Shannon please provide the link of your datasets.
– Abdur Rehman
Dec 31 '18 at 6:31
add a comment |
1 Answer
1
active
oldest
votes
I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.
As stated in the comment section you can try special function merge_asof()
for merging Time-series DataFrames
DataFrame First:
>>> df1
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 GOOG 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100
DataFrame Second:
>>> df2
time ticker bid ask
0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
7 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')
>>> new_df
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
Check the Documentation Doc merge_asof
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53983467%2fmerge-two-dataframes-based-on-closest-datetime%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.
As stated in the comment section you can try special function merge_asof()
for merging Time-series DataFrames
DataFrame First:
>>> df1
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 GOOG 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100
DataFrame Second:
>>> df2
time ticker bid ask
0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
7 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')
>>> new_df
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
Check the Documentation Doc merge_asof
add a comment |
I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.
As stated in the comment section you can try special function merge_asof()
for merging Time-series DataFrames
DataFrame First:
>>> df1
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 GOOG 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100
DataFrame Second:
>>> df2
time ticker bid ask
0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
7 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')
>>> new_df
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
Check the Documentation Doc merge_asof
add a comment |
I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.
As stated in the comment section you can try special function merge_asof()
for merging Time-series DataFrames
DataFrame First:
>>> df1
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 GOOG 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100
DataFrame Second:
>>> df2
time ticker bid ask
0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
7 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')
>>> new_df
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
Check the Documentation Doc merge_asof
I'm taking liberty to have an example which i used during my learning :-) , hope that will help to achieve what you are looking.
As stated in the comment section you can try special function merge_asof()
for merging Time-series DataFrames
DataFrame First:
>>> df1
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 GOOG 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100
DataFrame Second:
>>> df2
time ticker bid ask
0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
7 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>> new_df = pd.merge_asof(df1, df2, on='time', by='ticker')
>>> new_df
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
Check the Documentation Doc merge_asof
answered Dec 31 '18 at 9:28
pygopygo
3,0551619
3,0551619
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53983467%2fmerge-two-dataframes-based-on-closest-datetime%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
check
pd.merge_asof
docs– anky_91
Dec 31 '18 at 6:01
@Shannon please provide the link of your datasets.
– Abdur Rehman
Dec 31 '18 at 6:31