set analysis: create pandas series with intersections as index and values as counts





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:



https://pypi.org/project/upsetplot/



I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.



So I instead began manipulating my dataframes directly but suspected the attempts were misguided.



Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.



import pandas as pd
from matplotlib import pyplot as plt
from upsetplot import generate_data, plot

df = pd.DataFrame({'john':[1,2,3,5,7,8],
'jerry':[1,2,5,7,9,2],
'josie':[2,2,3,2,5,6],
'jean':[6,5,7,6,2,4]})

df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})


the example from the package home



from upsetplot import generate_data
example = generate_data(aggregated=True)
example # doctest: +NORMALIZE_WHITESPACE
set0 set1 set2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64









share|improve this question

























  • Please mention your expected output.

    – Abdur Rehman
    Jan 4 at 6:29











  • df is your input dataframe ?

    – Abdur Rehman
    Jan 4 at 6:35











  • I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

    – Jeff S.
    Jan 4 at 6:40













  • So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

    – Abdur Rehman
    Jan 4 at 6:46











  • exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

    – Jeff S.
    Jan 4 at 6:54




















1















I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:



https://pypi.org/project/upsetplot/



I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.



So I instead began manipulating my dataframes directly but suspected the attempts were misguided.



Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.



import pandas as pd
from matplotlib import pyplot as plt
from upsetplot import generate_data, plot

df = pd.DataFrame({'john':[1,2,3,5,7,8],
'jerry':[1,2,5,7,9,2],
'josie':[2,2,3,2,5,6],
'jean':[6,5,7,6,2,4]})

df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})


the example from the package home



from upsetplot import generate_data
example = generate_data(aggregated=True)
example # doctest: +NORMALIZE_WHITESPACE
set0 set1 set2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64









share|improve this question

























  • Please mention your expected output.

    – Abdur Rehman
    Jan 4 at 6:29











  • df is your input dataframe ?

    – Abdur Rehman
    Jan 4 at 6:35











  • I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

    – Jeff S.
    Jan 4 at 6:40













  • So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

    – Abdur Rehman
    Jan 4 at 6:46











  • exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

    – Jeff S.
    Jan 4 at 6:54
















1












1








1








I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:



https://pypi.org/project/upsetplot/



I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.



So I instead began manipulating my dataframes directly but suspected the attempts were misguided.



Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.



import pandas as pd
from matplotlib import pyplot as plt
from upsetplot import generate_data, plot

df = pd.DataFrame({'john':[1,2,3,5,7,8],
'jerry':[1,2,5,7,9,2],
'josie':[2,2,3,2,5,6],
'jean':[6,5,7,6,2,4]})

df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})


the example from the package home



from upsetplot import generate_data
example = generate_data(aggregated=True)
example # doctest: +NORMALIZE_WHITESPACE
set0 set1 set2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64









share|improve this question
















I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:



https://pypi.org/project/upsetplot/



I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.



So I instead began manipulating my dataframes directly but suspected the attempts were misguided.



Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.



import pandas as pd
from matplotlib import pyplot as plt
from upsetplot import generate_data, plot

df = pd.DataFrame({'john':[1,2,3,5,7,8],
'jerry':[1,2,5,7,9,2],
'josie':[2,2,3,2,5,6],
'jean':[6,5,7,6,2,4]})

df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})


the example from the package home



from upsetplot import generate_data
example = generate_data(aggregated=True)
example # doctest: +NORMALIZE_WHITESPACE
set0 set1 set2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64






python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 4 at 7:00







Jeff S.

















asked Jan 4 at 6:24









Jeff S.Jeff S.

377




377













  • Please mention your expected output.

    – Abdur Rehman
    Jan 4 at 6:29











  • df is your input dataframe ?

    – Abdur Rehman
    Jan 4 at 6:35











  • I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

    – Jeff S.
    Jan 4 at 6:40













  • So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

    – Abdur Rehman
    Jan 4 at 6:46











  • exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

    – Jeff S.
    Jan 4 at 6:54





















  • Please mention your expected output.

    – Abdur Rehman
    Jan 4 at 6:29











  • df is your input dataframe ?

    – Abdur Rehman
    Jan 4 at 6:35











  • I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

    – Jeff S.
    Jan 4 at 6:40













  • So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

    – Abdur Rehman
    Jan 4 at 6:46











  • exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

    – Jeff S.
    Jan 4 at 6:54



















Please mention your expected output.

– Abdur Rehman
Jan 4 at 6:29





Please mention your expected output.

– Abdur Rehman
Jan 4 at 6:29













df is your input dataframe ?

– Abdur Rehman
Jan 4 at 6:35





df is your input dataframe ?

– Abdur Rehman
Jan 4 at 6:35













I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

– Jeff S.
Jan 4 at 6:40







I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

– Jeff S.
Jan 4 at 6:40















So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

– Abdur Rehman
Jan 4 at 6:46





So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

– Abdur Rehman
Jan 4 at 6:46













exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

– Jeff S.
Jan 4 at 6:54







exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

– Jeff S.
Jan 4 at 6:54














1 Answer
1






active

oldest

votes


















1














Aggregate count by GroupBy.size with all columns without food:



df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})

cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64





share|improve this answer





















  • 1





    jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

    – Jeff S.
    Jan 4 at 7:10












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54033983%2fset-analysis-create-pandas-series-with-intersections-as-index-and-values-as-cou%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Aggregate count by GroupBy.size with all columns without food:



df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})

cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64





share|improve this answer





















  • 1





    jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

    – Jeff S.
    Jan 4 at 7:10
















1














Aggregate count by GroupBy.size with all columns without food:



df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})

cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64





share|improve this answer





















  • 1





    jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

    – Jeff S.
    Jan 4 at 7:10














1












1








1







Aggregate count by GroupBy.size with all columns without food:



df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})

cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64





share|improve this answer















Aggregate count by GroupBy.size with all columns without food:



df = pd.DataFrame({'john':[True,False,True,False,True,False],
'jerry':[True,True,False,True,False,True],
'josie':[True,False,False,True,False,False],
'jean':[True,False,False,True,False,False],
'food':['apple','carrot','choc','bread','ham','nut']})

cols = df.columns.difference(['food']).tolist()
s = df.groupby(cols).size()
print (s)
jean jerry john josie
False False True False 2
True False False 2
True True False True 1
True True 1
dtype: int64






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 4 at 7:02

























answered Jan 4 at 6:37









jezraeljezrael

358k26323403




358k26323403








  • 1





    jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

    – Jeff S.
    Jan 4 at 7:10














  • 1





    jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

    – Jeff S.
    Jan 4 at 7:10








1




1





jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

– Jeff S.
Jan 4 at 7:10





jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

– Jeff S.
Jan 4 at 7:10




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54033983%2fset-analysis-create-pandas-series-with-intersections-as-index-and-values-as-cou%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Mossoró

Error while reading .h5 file using the rhdf5 package in R

Pushsharp Apns notification error: 'InvalidToken'