set analysis: create pandas series with intersections as index and values as counts

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:

https://pypi.org/project/upsetplot/

I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.

So I instead began manipulating my dataframes directly but suspected the attempts were misguided.

Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.

import pandas as pd

from matplotlib import pyplot as plt

from upsetplot import generate_data, plot



df = pd.DataFrame({'john':[1,2,3,5,7,8],

              'jerry':[1,2,5,7,9,2],

              'josie':[2,2,3,2,5,6],

              'jean':[6,5,7,6,2,4]})



df = pd.DataFrame({'john':[True,False,True,False,True,False],

              'jerry':[True,True,False,True,False,True],

              'josie':[True,False,False,True,False,False],

              'jean':[True,False,False,True,False,False],

              'food':['apple','carrot','choc','bread','ham','nut']})

the example from the package home

from upsetplot import generate_data

example = generate_data(aggregated=True)

example  # doctest: +NORMALIZE_WHITESPACE

set0   set1   set2

False  False  False      56

              True      283

       True   False    1279

              True     5882

True   False  False      24

              True       90

       True   False     429

              True     1957

Name: value, dtype: int64

edited Jan 4 at 7:00

asked Jan 4 at 6:24

Jeff S.

377

Please mention your expected output.

– Abdur Rehman
Jan 4 at 6:29

df is your input dataframe ?

– Abdur Rehman
Jan 4 at 6:35

I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

– Jeff S.
Jan 4 at 6:40

So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

– Abdur Rehman
Jan 4 at 6:46

exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

– Jeff S.
Jan 4 at 6:54

|
show 1 more comment

I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:

https://pypi.org/project/upsetplot/

I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.

So I instead began manipulating my dataframes directly but suspected the attempts were misguided.

Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.

import pandas as pd

from matplotlib import pyplot as plt

from upsetplot import generate_data, plot



df = pd.DataFrame({'john':[1,2,3,5,7,8],

              'jerry':[1,2,5,7,9,2],

              'josie':[2,2,3,2,5,6],

              'jean':[6,5,7,6,2,4]})



df = pd.DataFrame({'john':[True,False,True,False,True,False],

              'jerry':[True,True,False,True,False,True],

              'josie':[True,False,False,True,False,False],

              'jean':[True,False,False,True,False,False],

              'food':['apple','carrot','choc','bread','ham','nut']})

the example from the package home

from upsetplot import generate_data

example = generate_data(aggregated=True)

example  # doctest: +NORMALIZE_WHITESPACE

set0   set1   set2

False  False  False      56

              True      283

       True   False    1279

              True     5882

True   False  False      24

              True       90

       True   False     429

              True     1957

Name: value, dtype: int64

edited Jan 4 at 7:00

asked Jan 4 at 6:24

Jeff S.

377

Please mention your expected output.

– Abdur Rehman
Jan 4 at 6:29

df is your input dataframe ?

– Abdur Rehman
Jan 4 at 6:35

I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

– Jeff S.
Jan 4 at 6:40

So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

– Abdur Rehman
Jan 4 at 6:46

exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

– Jeff S.
Jan 4 at 6:54

|
show 1 more comment

I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:

https://pypi.org/project/upsetplot/

I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.

So I instead began manipulating my dataframes directly but suspected the attempts were misguided.

Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.

import pandas as pd

from matplotlib import pyplot as plt

from upsetplot import generate_data, plot



df = pd.DataFrame({'john':[1,2,3,5,7,8],

              'jerry':[1,2,5,7,9,2],

              'josie':[2,2,3,2,5,6],

              'jean':[6,5,7,6,2,4]})



df = pd.DataFrame({'john':[True,False,True,False,True,False],

              'jerry':[True,True,False,True,False,True],

              'josie':[True,False,False,True,False,False],

              'jean':[True,False,False,True,False,False],

              'food':['apple','carrot','choc','bread','ham','nut']})

the example from the package home

from upsetplot import generate_data

example = generate_data(aggregated=True)

example  # doctest: +NORMALIZE_WHITESPACE

set0   set1   set2

False  False  False      56

              True      283

       True   False    1279

              True     5882

True   False  False      24

              True       90

       True   False     429

              True     1957

Name: value, dtype: int64

edited Jan 4 at 7:00

asked Jan 4 at 6:24

Jeff S.

377

I've tried and tried, all day to try and make this work and it's starting to make me angry!
All I want to do is create a necessary pandas series for input into upsetplot as detailed here:

https://pypi.org/project/upsetplot/

I don't understand how the generate_data function is manipulating its sets to make a series. I would have assumed that there was a simple way to do this by calling set(), but I can't seem to find it.

So I instead began manipulating my dataframes directly but suspected the attempts were misguided.

Thus I resort to providing a simple dataframe below and pray that some kind soul can enlighten me.

import pandas as pd

from matplotlib import pyplot as plt

from upsetplot import generate_data, plot



df = pd.DataFrame({'john':[1,2,3,5,7,8],

              'jerry':[1,2,5,7,9,2],

              'josie':[2,2,3,2,5,6],

              'jean':[6,5,7,6,2,4]})



df = pd.DataFrame({'john':[True,False,True,False,True,False],

              'jerry':[True,True,False,True,False,True],

              'josie':[True,False,False,True,False,False],

              'jean':[True,False,False,True,False,False],

              'food':['apple','carrot','choc','bread','ham','nut']})

the example from the package home

from upsetplot import generate_data

example = generate_data(aggregated=True)

example  # doctest: +NORMALIZE_WHITESPACE

set0   set1   set2

False  False  False      56

              True      283

       True   False    1279

              True     5882

True   False  False      24

              True       90

       True   False     429

              True     1957

Name: value, dtype: int64

python pandas

edited Jan 4 at 7:00

asked Jan 4 at 6:24

Jeff S.

377

edited Jan 4 at 7:00

asked Jan 4 at 6:24

Jeff S.

377

edited Jan 4 at 7:00

asked Jan 4 at 6:24

Jeff S.

377

asked Jan 4 at 6:24

Jeff S.

377

asked Jan 4 at 6:24

Jeff S.

377

Please mention your expected output.

– Abdur Rehman
Jan 4 at 6:29

df is your input dataframe ?

– Abdur Rehman
Jan 4 at 6:35

I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

– Jeff S.
Jan 4 at 6:40

So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

– Abdur Rehman
Jan 4 at 6:46

exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

– Jeff S.
Jan 4 at 6:54

|
show 1 more comment

Please mention your expected output.

– Abdur Rehman
Jan 4 at 6:29

df is your input dataframe ?

– Abdur Rehman
Jan 4 at 6:35

I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

– Jeff S.
Jan 4 at 6:40

So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

– Abdur Rehman
Jan 4 at 6:46

exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

– Jeff S.
Jan 4 at 6:54

Please mention your expected output.

– Abdur Rehman
Jan 4 at 6:29

df is your input dataframe ?

– Abdur Rehman
Jan 4 at 6:35

I'd expect a pandas series object like the one shown on the PyPI page. i've included it above. df is the dataframe yes. but its just an example to start with, I'm beyond caring how the df is set up (i.e. whether the values are strings, integeres, booleans etc) because im just so perplexed

– Jeff S.
Jan 4 at 6:40

So you want a dataframe like this but last column will be replaced by your food column. If I am not right then mention your expected output with respect to your input dataframe as your output is still very vague and confused.

– Abdur Rehman
Jan 4 at 6:46

exactly. for the pandas series in 'example' the sets of booleans are all part of the index and the counts are the values. sorry i see what you mean, i'll change the df

– Jeff S.
Jan 4 at 6:54

|
show 1 more comment

1 Answer
1

active

oldest

votes

Aggregate count by GroupBy.size with all columns without food:

df = pd.DataFrame({'john':[True,False,True,False,True,False],

              'jerry':[True,True,False,True,False,True],

              'josie':[True,False,False,True,False,False],

              'jean':[True,False,False,True,False,False],

              'food':['apple','carrot','choc','bread','ham','nut']})



cols = df.columns.difference(['food']).tolist()

s = df.groupby(cols).size()

print (s)

jean   jerry  john   josie

False  False  True   False    2

       True   False  False    2

True   True   False  True     1

              True   True     1

dtype: int64

edited Jan 4 at 7:02

answered Jan 4 at 6:37

jezrael

358k26323403

1

jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

– Jeff S.
Jan 4 at 7:10

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54033983%2fset-analysis-create-pandas-series-with-intersections-as-index-and-values-as-cou%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Aggregate count by GroupBy.size with all columns without food:

df = pd.DataFrame({'john':[True,False,True,False,True,False],

              'jerry':[True,True,False,True,False,True],

              'josie':[True,False,False,True,False,False],

              'jean':[True,False,False,True,False,False],

              'food':['apple','carrot','choc','bread','ham','nut']})



cols = df.columns.difference(['food']).tolist()

s = df.groupby(cols).size()

print (s)

jean   jerry  john   josie

False  False  True   False    2

       True   False  False    2

True   True   False  True     1

              True   True     1

dtype: int64

edited Jan 4 at 7:02

answered Jan 4 at 6:37

jezrael

358k26323403

1

jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

– Jeff S.
Jan 4 at 7:10

add a comment |

Aggregate count by GroupBy.size with all columns without food:

df = pd.DataFrame({'john':[True,False,True,False,True,False],

              'jerry':[True,True,False,True,False,True],

              'josie':[True,False,False,True,False,False],

              'jean':[True,False,False,True,False,False],

              'food':['apple','carrot','choc','bread','ham','nut']})



cols = df.columns.difference(['food']).tolist()

s = df.groupby(cols).size()

print (s)

jean   jerry  john   josie

False  False  True   False    2

       True   False  False    2

True   True   False  True     1

              True   True     1

dtype: int64

edited Jan 4 at 7:02

answered Jan 4 at 6:37

jezrael

358k26323403

1

jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

– Jeff S.
Jan 4 at 7:10

add a comment |

Aggregate count by GroupBy.size with all columns without food:

df = pd.DataFrame({'john':[True,False,True,False,True,False],

              'jerry':[True,True,False,True,False,True],

              'josie':[True,False,False,True,False,False],

              'jean':[True,False,False,True,False,False],

              'food':['apple','carrot','choc','bread','ham','nut']})



cols = df.columns.difference(['food']).tolist()

s = df.groupby(cols).size()

print (s)

jean   jerry  john   josie

False  False  True   False    2

       True   False  False    2

True   True   False  True     1

              True   True     1

dtype: int64

edited Jan 4 at 7:02

answered Jan 4 at 6:37

jezrael

358k26323403

Aggregate count by GroupBy.size with all columns without food:

df = pd.DataFrame({'john':[True,False,True,False,True,False],

              'jerry':[True,True,False,True,False,True],

              'josie':[True,False,False,True,False,False],

              'jean':[True,False,False,True,False,False],

              'food':['apple','carrot','choc','bread','ham','nut']})



cols = df.columns.difference(['food']).tolist()

s = df.groupby(cols).size()

print (s)

jean   jerry  john   josie

False  False  True   False    2

       True   False  False    2

True   True   False  True     1

              True   True     1

dtype: int64

edited Jan 4 at 7:02

answered Jan 4 at 6:37

jezrael

358k26323403

edited Jan 4 at 7:02

answered Jan 4 at 6:37

jezrael

358k26323403

answered Jan 4 at 6:37

jezrael

358k26323403

answered Jan 4 at 6:37

jezrael

358k26323403

1

jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

– Jeff S.
Jan 4 at 7:10

add a comment |

1

jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

– Jeff S.
Jan 4 at 7:10

jezrael you are my hero! kids just got home from childcare super grumpy and combined with this problem I was truly losing my mind. many thanks.

– Jeff S.
Jan 4 at 7:10

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk