Mixed object type columns and managing duplicates
![Multi tool use Multi tool use](http://sgv.ssvwv.com/sg/ssvwvcomimagb.png)
Multi tool use
I have merged 4 datasets and I can note duplicated rows in the data frame. However, when I command pandas to show me the duplicated rows, it says there is none and hence my codes to remove duplicated rows is not responding. Any help would be appreciated.
Dataframe sample:
end_time_x start_time_x duration deviceuuid time_offset_x exercise_type max_speed calorie mean_speed distance ... time_offset create_time weekday month startsleep wakeup sleep_duration duration_mins powernaps weekend
0 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
1 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 05:12:34.278000-04:00 6 1 0 4 04:12:00 252.0 False True
2 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-08 07:45:13.936000-04:00 6 1 22 7 09:11:00 551.0 False True
3 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
I have tried the code below yet they yield the same result if I omit the drop_duplicates lines.
code for checking duplicates:
df_merged.duplicated().sum()
df_merged.loc[df_merged.duplicated(),:]
code for merging data frames by first dropping duplicates in 2 out of 4 data frames:
df_exercise_cleaned=df_exercise.drop_duplicates()
df_HR_cleaned=df_HR.drop_duplicates() df_merged=df_exercise_cleaned.merge(df_HR_cleaned,on='date',how='inner').merge(df_FC, on='date',how='inner').merge(df_sleep,on='date',how='inner')
adding the dtypes post checking for mixed object columns and converting date to dt:
python pandas merge duplicates
|
show 1 more comment
I have merged 4 datasets and I can note duplicated rows in the data frame. However, when I command pandas to show me the duplicated rows, it says there is none and hence my codes to remove duplicated rows is not responding. Any help would be appreciated.
Dataframe sample:
end_time_x start_time_x duration deviceuuid time_offset_x exercise_type max_speed calorie mean_speed distance ... time_offset create_time weekday month startsleep wakeup sleep_duration duration_mins powernaps weekend
0 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
1 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 05:12:34.278000-04:00 6 1 0 4 04:12:00 252.0 False True
2 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-08 07:45:13.936000-04:00 6 1 22 7 09:11:00 551.0 False True
3 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
I have tried the code below yet they yield the same result if I omit the drop_duplicates lines.
code for checking duplicates:
df_merged.duplicated().sum()
df_merged.loc[df_merged.duplicated(),:]
code for merging data frames by first dropping duplicates in 2 out of 4 data frames:
df_exercise_cleaned=df_exercise.drop_duplicates()
df_HR_cleaned=df_HR.drop_duplicates() df_merged=df_exercise_cleaned.merge(df_HR_cleaned,on='date',how='inner').merge(df_FC, on='date',how='inner').merge(df_sleep,on='date',how='inner')
adding the dtypes post checking for mixed object columns and converting date to dt:
python pandas merge duplicates
check the datatypes, if there is a mismatch , this could happen : stackoverflow.com/questions/50686970/…
– anky_91
Dec 31 '18 at 12:17
Thank you - yes, my 'date' column has mixed object types. All comments and questions appear to ask for a way to check for the error; how can we get the resolution to address it?
– SFSN
Dec 31 '18 at 12:31
can you assign adatetime
to all components bydf[['time1','time2','time3']] = df[['time1','time2','time3']].apply(pd.to_datetime,errors='coerce')
and then dedup? replace time1,2 and 3 with original column names
– anky_91
Dec 31 '18 at 12:44
I applied datetime to the date column with mixed object types (without errors 'coerce' ) but it still props up as a mixed object type column: df_merged['date']=pd.to_datetime(df_merged['date'])
– SFSN
Dec 31 '18 at 18:55
Can't seem to move this to chat but wouldn't that impact my int and float columns that are needed for analysis and visualization?
– SFSN
Dec 31 '18 at 20:41
|
show 1 more comment
I have merged 4 datasets and I can note duplicated rows in the data frame. However, when I command pandas to show me the duplicated rows, it says there is none and hence my codes to remove duplicated rows is not responding. Any help would be appreciated.
Dataframe sample:
end_time_x start_time_x duration deviceuuid time_offset_x exercise_type max_speed calorie mean_speed distance ... time_offset create_time weekday month startsleep wakeup sleep_duration duration_mins powernaps weekend
0 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
1 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 05:12:34.278000-04:00 6 1 0 4 04:12:00 252.0 False True
2 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-08 07:45:13.936000-04:00 6 1 22 7 09:11:00 551.0 False True
3 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
I have tried the code below yet they yield the same result if I omit the drop_duplicates lines.
code for checking duplicates:
df_merged.duplicated().sum()
df_merged.loc[df_merged.duplicated(),:]
code for merging data frames by first dropping duplicates in 2 out of 4 data frames:
df_exercise_cleaned=df_exercise.drop_duplicates()
df_HR_cleaned=df_HR.drop_duplicates() df_merged=df_exercise_cleaned.merge(df_HR_cleaned,on='date',how='inner').merge(df_FC, on='date',how='inner').merge(df_sleep,on='date',how='inner')
adding the dtypes post checking for mixed object columns and converting date to dt:
python pandas merge duplicates
I have merged 4 datasets and I can note duplicated rows in the data frame. However, when I command pandas to show me the duplicated rows, it says there is none and hence my codes to remove duplicated rows is not responding. Any help would be appreciated.
Dataframe sample:
end_time_x start_time_x duration deviceuuid time_offset_x exercise_type max_speed calorie mean_speed distance ... time_offset create_time weekday month startsleep wakeup sleep_duration duration_mins powernaps weekend
0 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
1 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 05:12:34.278000-04:00 6 1 0 4 04:12:00 252.0 False True
2 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-08 07:45:13.936000-04:00 6 1 22 7 09:11:00 551.0 False True
3 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
I have tried the code below yet they yield the same result if I omit the drop_duplicates lines.
code for checking duplicates:
df_merged.duplicated().sum()
df_merged.loc[df_merged.duplicated(),:]
code for merging data frames by first dropping duplicates in 2 out of 4 data frames:
df_exercise_cleaned=df_exercise.drop_duplicates()
df_HR_cleaned=df_HR.drop_duplicates() df_merged=df_exercise_cleaned.merge(df_HR_cleaned,on='date',how='inner').merge(df_FC, on='date',how='inner').merge(df_sleep,on='date',how='inner')
adding the dtypes post checking for mixed object columns and converting date to dt:
python pandas merge duplicates
python pandas merge duplicates
edited Dec 31 '18 at 20:48
SFSN
asked Dec 31 '18 at 12:12
SFSNSFSN
206
206
check the datatypes, if there is a mismatch , this could happen : stackoverflow.com/questions/50686970/…
– anky_91
Dec 31 '18 at 12:17
Thank you - yes, my 'date' column has mixed object types. All comments and questions appear to ask for a way to check for the error; how can we get the resolution to address it?
– SFSN
Dec 31 '18 at 12:31
can you assign adatetime
to all components bydf[['time1','time2','time3']] = df[['time1','time2','time3']].apply(pd.to_datetime,errors='coerce')
and then dedup? replace time1,2 and 3 with original column names
– anky_91
Dec 31 '18 at 12:44
I applied datetime to the date column with mixed object types (without errors 'coerce' ) but it still props up as a mixed object type column: df_merged['date']=pd.to_datetime(df_merged['date'])
– SFSN
Dec 31 '18 at 18:55
Can't seem to move this to chat but wouldn't that impact my int and float columns that are needed for analysis and visualization?
– SFSN
Dec 31 '18 at 20:41
|
show 1 more comment
check the datatypes, if there is a mismatch , this could happen : stackoverflow.com/questions/50686970/…
– anky_91
Dec 31 '18 at 12:17
Thank you - yes, my 'date' column has mixed object types. All comments and questions appear to ask for a way to check for the error; how can we get the resolution to address it?
– SFSN
Dec 31 '18 at 12:31
can you assign adatetime
to all components bydf[['time1','time2','time3']] = df[['time1','time2','time3']].apply(pd.to_datetime,errors='coerce')
and then dedup? replace time1,2 and 3 with original column names
– anky_91
Dec 31 '18 at 12:44
I applied datetime to the date column with mixed object types (without errors 'coerce' ) but it still props up as a mixed object type column: df_merged['date']=pd.to_datetime(df_merged['date'])
– SFSN
Dec 31 '18 at 18:55
Can't seem to move this to chat but wouldn't that impact my int and float columns that are needed for analysis and visualization?
– SFSN
Dec 31 '18 at 20:41
check the datatypes, if there is a mismatch , this could happen : stackoverflow.com/questions/50686970/…
– anky_91
Dec 31 '18 at 12:17
check the datatypes, if there is a mismatch , this could happen : stackoverflow.com/questions/50686970/…
– anky_91
Dec 31 '18 at 12:17
Thank you - yes, my 'date' column has mixed object types. All comments and questions appear to ask for a way to check for the error; how can we get the resolution to address it?
– SFSN
Dec 31 '18 at 12:31
Thank you - yes, my 'date' column has mixed object types. All comments and questions appear to ask for a way to check for the error; how can we get the resolution to address it?
– SFSN
Dec 31 '18 at 12:31
can you assign a
datetime
to all components by df[['time1','time2','time3']] = df[['time1','time2','time3']].apply(pd.to_datetime,errors='coerce')
and then dedup? replace time1,2 and 3 with original column names– anky_91
Dec 31 '18 at 12:44
can you assign a
datetime
to all components by df[['time1','time2','time3']] = df[['time1','time2','time3']].apply(pd.to_datetime,errors='coerce')
and then dedup? replace time1,2 and 3 with original column names– anky_91
Dec 31 '18 at 12:44
I applied datetime to the date column with mixed object types (without errors 'coerce' ) but it still props up as a mixed object type column: df_merged['date']=pd.to_datetime(df_merged['date'])
– SFSN
Dec 31 '18 at 18:55
I applied datetime to the date column with mixed object types (without errors 'coerce' ) but it still props up as a mixed object type column: df_merged['date']=pd.to_datetime(df_merged['date'])
– SFSN
Dec 31 '18 at 18:55
Can't seem to move this to chat but wouldn't that impact my int and float columns that are needed for analysis and visualization?
– SFSN
Dec 31 '18 at 20:41
Can't seem to move this to chat but wouldn't that impact my int and float columns that are needed for analysis and visualization?
– SFSN
Dec 31 '18 at 20:41
|
show 1 more comment
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53987378%2fmixed-object-type-columns-and-managing-duplicates%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53987378%2fmixed-object-type-columns-and-managing-duplicates%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
J O g,oUYG1,K,kRV BxARnT hV9UaFSlUg,lz,R thgzrc76YTLRLDI,7Noph9PAjcb R5
check the datatypes, if there is a mismatch , this could happen : stackoverflow.com/questions/50686970/…
– anky_91
Dec 31 '18 at 12:17
Thank you - yes, my 'date' column has mixed object types. All comments and questions appear to ask for a way to check for the error; how can we get the resolution to address it?
– SFSN
Dec 31 '18 at 12:31
can you assign a
datetime
to all components bydf[['time1','time2','time3']] = df[['time1','time2','time3']].apply(pd.to_datetime,errors='coerce')
and then dedup? replace time1,2 and 3 with original column names– anky_91
Dec 31 '18 at 12:44
I applied datetime to the date column with mixed object types (without errors 'coerce' ) but it still props up as a mixed object type column: df_merged['date']=pd.to_datetime(df_merged['date'])
– SFSN
Dec 31 '18 at 18:55
Can't seem to move this to chat but wouldn't that impact my int and float columns that are needed for analysis and visualization?
– SFSN
Dec 31 '18 at 20:41