How to groupby a percentage range of each value in pandas python
If I have a dataframe of the format:
date value
2018-10-31 23:45:00 0.031190
2018-11-01 00:00:00 0.031211
2018-11-01 00:15:00 0.031201
2018-11-01 00:30:00 0.031203
2018-11-01 00:45:00 0.031186
2018-11-01 01:00:00 0.031208
2018-11-01 01:15:00 0.031191
2018-11-01 01:30:00 0.031170
2018-11-01 01:45:00 0.031155
2018-11-01 02:00:00 0.031146
2018-11-01 02:15:00 0.031176
2018-11-01 02:30:00 0.031178
2018-11-01 02:45:00 0.031163
2018-11-01 03:00:00 0.031187
2018-11-01 03:15:00 0.031140
2018-11-01 03:30:00 0.031165
2018-11-01 03:45:00 0.031166
2018-11-01 04:00:00 0.031182
2018-11-01 04:15:00 0.031155
2018-11-01 04:30:00 0.031145
2018-11-01 04:45:00 0.031177
2018-11-01 05:00:00 0.031189
2018-11-01 05:15:00 0.031183
2018-11-01 05:30:00 0.031175
2018-11-01 05:45:00 0.031184
2018-11-01 06:00:00 0.031174
2018-11-01 06:15:00 0.031167
2018-11-01 06:30:00 0.031161
2018-11-01 06:45:00 0.031163
2018-11-01 07:00:00 0.031211
2018-11-01 07:15:00 0.031183
2018-11-01 07:30:00 0.031156
2018-11-01 07:45:00 0.031142
2018-11-01 08:00:00 0.031154
2018-11-01 08:15:00 0.031152
2018-11-01 08:30:00 0.031137
2018-11-01 08:45:00 0.031142
2018-11-01 09:00:00 0.031155
2018-11-01 09:15:00 0.031145
2018-11-01 09:30:00 0.031154
2018-11-01 09:45:00 0.031140
2018-11-01 10:00:00 0.031146
2018-11-01 10:15:00 0.031149
2018-11-01 10:30:00 0.031164
2018-11-01 10:45:00 0.031172
2018-11-01 11:00:00 0.031162
2018-11-01 11:15:00 0.031141
2018-11-01 11:30:00 0.031165
2018-11-01 11:45:00 0.031174
2018-11-01 12:00:00 0.031180
How do I segment the data into groups of a 5% difference in value?
For example, 0.031190 would be in a group of values between 0.0296305 and 0.0327495. If a value is within multiple groups that is fine - in fact it is expected. If a value is not anywhere near any other values, then it will just be by itself.
python pandas
add a comment |
If I have a dataframe of the format:
date value
2018-10-31 23:45:00 0.031190
2018-11-01 00:00:00 0.031211
2018-11-01 00:15:00 0.031201
2018-11-01 00:30:00 0.031203
2018-11-01 00:45:00 0.031186
2018-11-01 01:00:00 0.031208
2018-11-01 01:15:00 0.031191
2018-11-01 01:30:00 0.031170
2018-11-01 01:45:00 0.031155
2018-11-01 02:00:00 0.031146
2018-11-01 02:15:00 0.031176
2018-11-01 02:30:00 0.031178
2018-11-01 02:45:00 0.031163
2018-11-01 03:00:00 0.031187
2018-11-01 03:15:00 0.031140
2018-11-01 03:30:00 0.031165
2018-11-01 03:45:00 0.031166
2018-11-01 04:00:00 0.031182
2018-11-01 04:15:00 0.031155
2018-11-01 04:30:00 0.031145
2018-11-01 04:45:00 0.031177
2018-11-01 05:00:00 0.031189
2018-11-01 05:15:00 0.031183
2018-11-01 05:30:00 0.031175
2018-11-01 05:45:00 0.031184
2018-11-01 06:00:00 0.031174
2018-11-01 06:15:00 0.031167
2018-11-01 06:30:00 0.031161
2018-11-01 06:45:00 0.031163
2018-11-01 07:00:00 0.031211
2018-11-01 07:15:00 0.031183
2018-11-01 07:30:00 0.031156
2018-11-01 07:45:00 0.031142
2018-11-01 08:00:00 0.031154
2018-11-01 08:15:00 0.031152
2018-11-01 08:30:00 0.031137
2018-11-01 08:45:00 0.031142
2018-11-01 09:00:00 0.031155
2018-11-01 09:15:00 0.031145
2018-11-01 09:30:00 0.031154
2018-11-01 09:45:00 0.031140
2018-11-01 10:00:00 0.031146
2018-11-01 10:15:00 0.031149
2018-11-01 10:30:00 0.031164
2018-11-01 10:45:00 0.031172
2018-11-01 11:00:00 0.031162
2018-11-01 11:15:00 0.031141
2018-11-01 11:30:00 0.031165
2018-11-01 11:45:00 0.031174
2018-11-01 12:00:00 0.031180
How do I segment the data into groups of a 5% difference in value?
For example, 0.031190 would be in a group of values between 0.0296305 and 0.0327495. If a value is within multiple groups that is fine - in fact it is expected. If a value is not anywhere near any other values, then it will just be by itself.
python pandas
1
Please include complete sample data with expected results. This data will lead to one group. Which is not a valid test in my opinion.
– Scott Boston
Jan 3 at 18:20
2
check with qcut .
– Wen-Ben
Jan 3 at 18:24
@user2330270, is the answer helpful?
– Zanshin
Jan 8 at 14:04
add a comment |
If I have a dataframe of the format:
date value
2018-10-31 23:45:00 0.031190
2018-11-01 00:00:00 0.031211
2018-11-01 00:15:00 0.031201
2018-11-01 00:30:00 0.031203
2018-11-01 00:45:00 0.031186
2018-11-01 01:00:00 0.031208
2018-11-01 01:15:00 0.031191
2018-11-01 01:30:00 0.031170
2018-11-01 01:45:00 0.031155
2018-11-01 02:00:00 0.031146
2018-11-01 02:15:00 0.031176
2018-11-01 02:30:00 0.031178
2018-11-01 02:45:00 0.031163
2018-11-01 03:00:00 0.031187
2018-11-01 03:15:00 0.031140
2018-11-01 03:30:00 0.031165
2018-11-01 03:45:00 0.031166
2018-11-01 04:00:00 0.031182
2018-11-01 04:15:00 0.031155
2018-11-01 04:30:00 0.031145
2018-11-01 04:45:00 0.031177
2018-11-01 05:00:00 0.031189
2018-11-01 05:15:00 0.031183
2018-11-01 05:30:00 0.031175
2018-11-01 05:45:00 0.031184
2018-11-01 06:00:00 0.031174
2018-11-01 06:15:00 0.031167
2018-11-01 06:30:00 0.031161
2018-11-01 06:45:00 0.031163
2018-11-01 07:00:00 0.031211
2018-11-01 07:15:00 0.031183
2018-11-01 07:30:00 0.031156
2018-11-01 07:45:00 0.031142
2018-11-01 08:00:00 0.031154
2018-11-01 08:15:00 0.031152
2018-11-01 08:30:00 0.031137
2018-11-01 08:45:00 0.031142
2018-11-01 09:00:00 0.031155
2018-11-01 09:15:00 0.031145
2018-11-01 09:30:00 0.031154
2018-11-01 09:45:00 0.031140
2018-11-01 10:00:00 0.031146
2018-11-01 10:15:00 0.031149
2018-11-01 10:30:00 0.031164
2018-11-01 10:45:00 0.031172
2018-11-01 11:00:00 0.031162
2018-11-01 11:15:00 0.031141
2018-11-01 11:30:00 0.031165
2018-11-01 11:45:00 0.031174
2018-11-01 12:00:00 0.031180
How do I segment the data into groups of a 5% difference in value?
For example, 0.031190 would be in a group of values between 0.0296305 and 0.0327495. If a value is within multiple groups that is fine - in fact it is expected. If a value is not anywhere near any other values, then it will just be by itself.
python pandas
If I have a dataframe of the format:
date value
2018-10-31 23:45:00 0.031190
2018-11-01 00:00:00 0.031211
2018-11-01 00:15:00 0.031201
2018-11-01 00:30:00 0.031203
2018-11-01 00:45:00 0.031186
2018-11-01 01:00:00 0.031208
2018-11-01 01:15:00 0.031191
2018-11-01 01:30:00 0.031170
2018-11-01 01:45:00 0.031155
2018-11-01 02:00:00 0.031146
2018-11-01 02:15:00 0.031176
2018-11-01 02:30:00 0.031178
2018-11-01 02:45:00 0.031163
2018-11-01 03:00:00 0.031187
2018-11-01 03:15:00 0.031140
2018-11-01 03:30:00 0.031165
2018-11-01 03:45:00 0.031166
2018-11-01 04:00:00 0.031182
2018-11-01 04:15:00 0.031155
2018-11-01 04:30:00 0.031145
2018-11-01 04:45:00 0.031177
2018-11-01 05:00:00 0.031189
2018-11-01 05:15:00 0.031183
2018-11-01 05:30:00 0.031175
2018-11-01 05:45:00 0.031184
2018-11-01 06:00:00 0.031174
2018-11-01 06:15:00 0.031167
2018-11-01 06:30:00 0.031161
2018-11-01 06:45:00 0.031163
2018-11-01 07:00:00 0.031211
2018-11-01 07:15:00 0.031183
2018-11-01 07:30:00 0.031156
2018-11-01 07:45:00 0.031142
2018-11-01 08:00:00 0.031154
2018-11-01 08:15:00 0.031152
2018-11-01 08:30:00 0.031137
2018-11-01 08:45:00 0.031142
2018-11-01 09:00:00 0.031155
2018-11-01 09:15:00 0.031145
2018-11-01 09:30:00 0.031154
2018-11-01 09:45:00 0.031140
2018-11-01 10:00:00 0.031146
2018-11-01 10:15:00 0.031149
2018-11-01 10:30:00 0.031164
2018-11-01 10:45:00 0.031172
2018-11-01 11:00:00 0.031162
2018-11-01 11:15:00 0.031141
2018-11-01 11:30:00 0.031165
2018-11-01 11:45:00 0.031174
2018-11-01 12:00:00 0.031180
How do I segment the data into groups of a 5% difference in value?
For example, 0.031190 would be in a group of values between 0.0296305 and 0.0327495. If a value is within multiple groups that is fine - in fact it is expected. If a value is not anywhere near any other values, then it will just be by itself.
python pandas
python pandas
asked Jan 3 at 18:16
user2330270user2330270
97011329
97011329
1
Please include complete sample data with expected results. This data will lead to one group. Which is not a valid test in my opinion.
– Scott Boston
Jan 3 at 18:20
2
check with qcut .
– Wen-Ben
Jan 3 at 18:24
@user2330270, is the answer helpful?
– Zanshin
Jan 8 at 14:04
add a comment |
1
Please include complete sample data with expected results. This data will lead to one group. Which is not a valid test in my opinion.
– Scott Boston
Jan 3 at 18:20
2
check with qcut .
– Wen-Ben
Jan 3 at 18:24
@user2330270, is the answer helpful?
– Zanshin
Jan 8 at 14:04
1
1
Please include complete sample data with expected results. This data will lead to one group. Which is not a valid test in my opinion.
– Scott Boston
Jan 3 at 18:20
Please include complete sample data with expected results. This data will lead to one group. Which is not a valid test in my opinion.
– Scott Boston
Jan 3 at 18:20
2
2
check with qcut .
– Wen-Ben
Jan 3 at 18:24
check with qcut .
– Wen-Ben
Jan 3 at 18:24
@user2330270, is the answer helpful?
– Zanshin
Jan 8 at 14:04
@user2330270, is the answer helpful?
– Zanshin
Jan 8 at 14:04
add a comment |
1 Answer
1
active
oldest
votes
based on the data you provided something like this would work;
assuming you would need the range divided in 20 bins of 5%.
df['binned'] = pd.qcut(df['value'], 20)
df = df.groupby('binned')['value'].count()
print(df.head())
binned
(0.031127000000000002, 0.03114] 3
(0.03114, 0.031142] 3
(0.031142, 0.031145] 2
(0.031145, 0.031148] 2
(0.031148, 0.031154] 4
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54027687%2fhow-to-groupby-a-percentage-range-of-each-value-in-pandas-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
based on the data you provided something like this would work;
assuming you would need the range divided in 20 bins of 5%.
df['binned'] = pd.qcut(df['value'], 20)
df = df.groupby('binned')['value'].count()
print(df.head())
binned
(0.031127000000000002, 0.03114] 3
(0.03114, 0.031142] 3
(0.031142, 0.031145] 2
(0.031145, 0.031148] 2
(0.031148, 0.031154] 4
add a comment |
based on the data you provided something like this would work;
assuming you would need the range divided in 20 bins of 5%.
df['binned'] = pd.qcut(df['value'], 20)
df = df.groupby('binned')['value'].count()
print(df.head())
binned
(0.031127000000000002, 0.03114] 3
(0.03114, 0.031142] 3
(0.031142, 0.031145] 2
(0.031145, 0.031148] 2
(0.031148, 0.031154] 4
add a comment |
based on the data you provided something like this would work;
assuming you would need the range divided in 20 bins of 5%.
df['binned'] = pd.qcut(df['value'], 20)
df = df.groupby('binned')['value'].count()
print(df.head())
binned
(0.031127000000000002, 0.03114] 3
(0.03114, 0.031142] 3
(0.031142, 0.031145] 2
(0.031145, 0.031148] 2
(0.031148, 0.031154] 4
based on the data you provided something like this would work;
assuming you would need the range divided in 20 bins of 5%.
df['binned'] = pd.qcut(df['value'], 20)
df = df.groupby('binned')['value'].count()
print(df.head())
binned
(0.031127000000000002, 0.03114] 3
(0.03114, 0.031142] 3
(0.031142, 0.031145] 2
(0.031145, 0.031148] 2
(0.031148, 0.031154] 4
edited Jan 8 at 7:15
answered Jan 7 at 18:57
ZanshinZanshin
7601523
7601523
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54027687%2fhow-to-groupby-a-percentage-range-of-each-value-in-pandas-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Please include complete sample data with expected results. This data will lead to one group. Which is not a valid test in my opinion.
– Scott Boston
Jan 3 at 18:20
2
check with qcut .
– Wen-Ben
Jan 3 at 18:24
@user2330270, is the answer helpful?
– Zanshin
Jan 8 at 14:04