set analysis: create pandas series with intersections as index and values as counts
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:
https://pypi.org/project/upsetplot/
I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.
So I instead began manipulating my dataframes directly but suspected the attempts were misguided.
Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.
import pandas as pd
from matplotlib import pyplot as plt
from upsetplot import generate_data, plot
df = pd.DataFrame({'john':[1,2,3,5,7,8],
'jerry':[1,2,5,7,9,2],
'josie':[2,2,3,2,5,6],
'jean':[6,5,7,6,2,4]})
df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})
the example from the package home
from upsetplot import generate_data
example = generate_data(aggregated=True)
example # doctest: +NORMALIZE_WHITESPACE
set0 set1 set2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64
python pandas
|
show 1 more comment
I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:
https://pypi.org/project/upsetplot/
I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.
So I instead began manipulating my dataframes directly but suspected the attempts were misguided.
Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.
import pandas as pd
from matplotlib import pyplot as plt
from upsetplot import generate_data, plot
df = pd.DataFrame({'john':[1,2,3,5,7,8],
'jerry':[1,2,5,7,9,2],
'josie':[2,2,3,2,5,6],
'jean':[6,5,7,6,2,4]})
df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})
the example from the package home
from upsetplot import generate_data
example = generate_data(aggregated=True)
example # doctest: +NORMALIZE_WHITESPACE
set0 set1 set2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64
python pandas
Please mention your expected output.
– Abdur Rehman
Jan 4 at 6:29
dfis your input dataframe ?
– Abdur Rehman
Jan 4 at 6:35
I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed
– Jeff S.
Jan 4 at 6:40
So you want a dataframe like this but last column will be replaced by yourfoodcolumn. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.
– Abdur Rehman
Jan 4 at 6:46
exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df
– Jeff S.
Jan 4 at 6:54
|
show 1 more comment
I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:
https://pypi.org/project/upsetplot/
I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.
So I instead began manipulating my dataframes directly but suspected the attempts were misguided.
Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.
import pandas as pd
from matplotlib import pyplot as plt
from upsetplot import generate_data, plot
df = pd.DataFrame({'john':[1,2,3,5,7,8],
'jerry':[1,2,5,7,9,2],
'josie':[2,2,3,2,5,6],
'jean':[6,5,7,6,2,4]})
df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})
the example from the package home
from upsetplot import generate_data
example = generate_data(aggregated=True)
example # doctest: +NORMALIZE_WHITESPACE
set0 set1 set2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64
python pandas
I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:
https://pypi.org/project/upsetplot/
I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.
So I instead began manipulating my dataframes directly but suspected the attempts were misguided.
Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.
import pandas as pd
from matplotlib import pyplot as plt
from upsetplot import generate_data, plot
df = pd.DataFrame({'john':[1,2,3,5,7,8],
'jerry':[1,2,5,7,9,2],
'josie':[2,2,3,2,5,6],
'jean':[6,5,7,6,2,4]})
df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})
the example from the package home
from upsetplot import generate_data
example = generate_data(aggregated=True)
example # doctest: +NORMALIZE_WHITESPACE
set0 set1 set2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64
python pandas
python pandas
edited Jan 4 at 7:00
Jeff S.
asked Jan 4 at 6:24
Jeff S.Jeff S.
377
377
Please mention your expected output.
– Abdur Rehman
Jan 4 at 6:29
dfis your input dataframe ?
– Abdur Rehman
Jan 4 at 6:35
I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed
– Jeff S.
Jan 4 at 6:40
So you want a dataframe like this but last column will be replaced by yourfoodcolumn. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.
– Abdur Rehman
Jan 4 at 6:46
exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df
– Jeff S.
Jan 4 at 6:54
|
show 1 more comment
Please mention your expected output.
– Abdur Rehman
Jan 4 at 6:29
dfis your input dataframe ?
– Abdur Rehman
Jan 4 at 6:35
I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed
– Jeff S.
Jan 4 at 6:40
So you want a dataframe like this but last column will be replaced by yourfoodcolumn. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.
– Abdur Rehman
Jan 4 at 6:46
exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df
– Jeff S.
Jan 4 at 6:54
Please mention your expected output.
– Abdur Rehman
Jan 4 at 6:29
Please mention your expected output.
– Abdur Rehman
Jan 4 at 6:29
df is your input dataframe ?– Abdur Rehman
Jan 4 at 6:35
df is your input dataframe ?– Abdur Rehman
Jan 4 at 6:35
I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed
– Jeff S.
Jan 4 at 6:40
I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed
– Jeff S.
Jan 4 at 6:40
So you want a dataframe like this but last column will be replaced by your
food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.– Abdur Rehman
Jan 4 at 6:46
So you want a dataframe like this but last column will be replaced by your
food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.– Abdur Rehman
Jan 4 at 6:46
exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df
– Jeff S.
Jan 4 at 6:54
exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df
– Jeff S.
Jan 4 at 6:54
|
show 1 more comment
1 Answer
1
active
oldest
votes
Aggregate count by GroupBy.size with all columns without food:
df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})
cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64
1
jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.
– Jeff S.
Jan 4 at 7:10
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54033983%2fset-analysis-create-pandas-series-with-intersections-as-index-and-values-as-cou%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Aggregate count by GroupBy.size with all columns without food:
df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})
cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64
1
jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.
– Jeff S.
Jan 4 at 7:10
add a comment |
Aggregate count by GroupBy.size with all columns without food:
df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})
cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64
1
jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.
– Jeff S.
Jan 4 at 7:10
add a comment |
Aggregate count by GroupBy.size with all columns without food:
df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})
cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64
Aggregate count by GroupBy.size with all columns without food:
df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})
cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64
edited Jan 4 at 7:02
answered Jan 4 at 6:37
jezraeljezrael
358k26323403
358k26323403
1
jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.
– Jeff S.
Jan 4 at 7:10
add a comment |
1
jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.
– Jeff S.
Jan 4 at 7:10
1
1
jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.
– Jeff S.
Jan 4 at 7:10
jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.
– Jeff S.
Jan 4 at 7:10
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54033983%2fset-analysis-create-pandas-series-with-intersections-as-index-and-values-as-cou%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please mention your expected output.
– Abdur Rehman
Jan 4 at 6:29
dfis your input dataframe ?– Abdur Rehman
Jan 4 at 6:35
I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed
– Jeff S.
Jan 4 at 6:40
So you want a dataframe like this but last column will be replaced by your
foodcolumn. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.– Abdur Rehman
Jan 4 at 6:46
exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df
– Jeff S.
Jan 4 at 6:54