How to create Pivot Table with Python Dataframe based on columns's substring values and counts?

Dataset:





Item_Identifier  Item_Weight Item_Fat_Content  Item_Visibility  

0           FDA15         9.30          Low Fat         0.016047   

1           DRC01         5.92          Regular         0.019278   

2           FDN15        17.50          Low Fat         0.016760   

3           FDX07        19.20          Regular         0.065953   

4           NCD19         8.93          Low Fat         0.065953   



               Item_Type  Item_MRP Outlet_Identifier  

0                  Dairy  249.8092            OUT049   

1            Soft Drinks   48.2692            OUT018   

2                   Meat  141.6180            OUT049   

3  Fruits and Vegetables  182.0950            OUT010   

4              Household   53.8614            OUT013   



   Outlet_Establishment_Year Outlet_Size Outlet_Location_Type  

0                       1999      Medium               Tier 1   

1                       2009      Medium               Tier 3   

2                       1999      Medium               Tier 1   

3                       1998      Medium               Tier 3   

4                       1987        High               Tier 3   



         Outlet_Type   Item_Type_new  

0  Supermarket Type1      perishable  

1  Supermarket Type2  non-perishable  

2  Supermarket Type1      perishable  

3      Grocery Store      perishable  

4  Supermarket Type1  non-perishable

Pivotal Table:

Index: Item_Type, Columns: Substring of Item Identifiers, Values: counts.

Expected Output:



                          DR   FD   NC

  Baking Goods             0 1086    0

  Breads                   0  416    0

  Breakfast                0  186    0

  Canned                   0 1084    0

  Dairy                  229  907    0

  Frozen Foods             0 1426    0

  Fruits and Vegetables    0 2013    0

  Hard Drinks            362    0    0

  Health and Hygiene       0    0  858

  Household                0    0 1548

  Meat                     0  736    0

  Others                   0    0  280

  Seafood                  0   89    0

  Snack Foods              0 1989    0

  Soft Drinks            726    0    0

  Starchy Foods            0  269    0

edited Jan 3 at 3:57

Stephen Rauch

30k153758

asked Jan 3 at 3:56

datascientist110

205

1

Your sample output is unclear. And you should show what you have tried to give a clearer picture

– ycx
Jan 3 at 4:08

add a comment |

Dataset:





Item_Identifier  Item_Weight Item_Fat_Content  Item_Visibility  

0           FDA15         9.30          Low Fat         0.016047   

1           DRC01         5.92          Regular         0.019278   

2           FDN15        17.50          Low Fat         0.016760   

3           FDX07        19.20          Regular         0.065953   

4           NCD19         8.93          Low Fat         0.065953   



               Item_Type  Item_MRP Outlet_Identifier  

0                  Dairy  249.8092            OUT049   

1            Soft Drinks   48.2692            OUT018   

2                   Meat  141.6180            OUT049   

3  Fruits and Vegetables  182.0950            OUT010   

4              Household   53.8614            OUT013   



   Outlet_Establishment_Year Outlet_Size Outlet_Location_Type  

0                       1999      Medium               Tier 1   

1                       2009      Medium               Tier 3   

2                       1999      Medium               Tier 1   

3                       1998      Medium               Tier 3   

4                       1987        High               Tier 3   



         Outlet_Type   Item_Type_new  

0  Supermarket Type1      perishable  

1  Supermarket Type2  non-perishable  

2  Supermarket Type1      perishable  

3      Grocery Store      perishable  

4  Supermarket Type1  non-perishable

Pivotal Table:

Index: Item_Type, Columns: Substring of Item Identifiers, Values: counts.

Expected Output:



                          DR   FD   NC

  Baking Goods             0 1086    0

  Breads                   0  416    0

  Breakfast                0  186    0

  Canned                   0 1084    0

  Dairy                  229  907    0

  Frozen Foods             0 1426    0

  Fruits and Vegetables    0 2013    0

  Hard Drinks            362    0    0

  Health and Hygiene       0    0  858

  Household                0    0 1548

  Meat                     0  736    0

  Others                   0    0  280

  Seafood                  0   89    0

  Snack Foods              0 1989    0

  Soft Drinks            726    0    0

  Starchy Foods            0  269    0

edited Jan 3 at 3:57

Stephen Rauch

30k153758

asked Jan 3 at 3:56

datascientist110

205

1

Your sample output is unclear. And you should show what you have tried to give a clearer picture

– ycx
Jan 3 at 4:08

add a comment |

Dataset:





Item_Identifier  Item_Weight Item_Fat_Content  Item_Visibility  

0           FDA15         9.30          Low Fat         0.016047   

1           DRC01         5.92          Regular         0.019278   

2           FDN15        17.50          Low Fat         0.016760   

3           FDX07        19.20          Regular         0.065953   

4           NCD19         8.93          Low Fat         0.065953   



               Item_Type  Item_MRP Outlet_Identifier  

0                  Dairy  249.8092            OUT049   

1            Soft Drinks   48.2692            OUT018   

2                   Meat  141.6180            OUT049   

3  Fruits and Vegetables  182.0950            OUT010   

4              Household   53.8614            OUT013   



   Outlet_Establishment_Year Outlet_Size Outlet_Location_Type  

0                       1999      Medium               Tier 1   

1                       2009      Medium               Tier 3   

2                       1999      Medium               Tier 1   

3                       1998      Medium               Tier 3   

4                       1987        High               Tier 3   



         Outlet_Type   Item_Type_new  

0  Supermarket Type1      perishable  

1  Supermarket Type2  non-perishable  

2  Supermarket Type1      perishable  

3      Grocery Store      perishable  

4  Supermarket Type1  non-perishable

Pivotal Table:

Index: Item_Type, Columns: Substring of Item Identifiers, Values: counts.

Expected Output:



                          DR   FD   NC

  Baking Goods             0 1086    0

  Breads                   0  416    0

  Breakfast                0  186    0

  Canned                   0 1084    0

  Dairy                  229  907    0

  Frozen Foods             0 1426    0

  Fruits and Vegetables    0 2013    0

  Hard Drinks            362    0    0

  Health and Hygiene       0    0  858

  Household                0    0 1548

  Meat                     0  736    0

  Others                   0    0  280

  Seafood                  0   89    0

  Snack Foods              0 1989    0

  Soft Drinks            726    0    0

  Starchy Foods            0  269    0

edited Jan 3 at 3:57

Stephen Rauch

30k153758

asked Jan 3 at 3:56

datascientist110

205

Dataset:





Item_Identifier  Item_Weight Item_Fat_Content  Item_Visibility  

0           FDA15         9.30          Low Fat         0.016047   

1           DRC01         5.92          Regular         0.019278   

2           FDN15        17.50          Low Fat         0.016760   

3           FDX07        19.20          Regular         0.065953   

4           NCD19         8.93          Low Fat         0.065953   



               Item_Type  Item_MRP Outlet_Identifier  

0                  Dairy  249.8092            OUT049   

1            Soft Drinks   48.2692            OUT018   

2                   Meat  141.6180            OUT049   

3  Fruits and Vegetables  182.0950            OUT010   

4              Household   53.8614            OUT013   



   Outlet_Establishment_Year Outlet_Size Outlet_Location_Type  

0                       1999      Medium               Tier 1   

1                       2009      Medium               Tier 3   

2                       1999      Medium               Tier 1   

3                       1998      Medium               Tier 3   

4                       1987        High               Tier 3   



         Outlet_Type   Item_Type_new  

0  Supermarket Type1      perishable  

1  Supermarket Type2  non-perishable  

2  Supermarket Type1      perishable  

3      Grocery Store      perishable  

4  Supermarket Type1  non-perishable

Pivotal Table:

Index: Item_Type, Columns: Substring of Item Identifiers, Values: counts.

Expected Output:



                          DR   FD   NC

  Baking Goods             0 1086    0

  Breads                   0  416    0

  Breakfast                0  186    0

  Canned                   0 1084    0

  Dairy                  229  907    0

  Frozen Foods             0 1426    0

  Fruits and Vegetables    0 2013    0

  Hard Drinks            362    0    0

  Health and Hygiene       0    0  858

  Household                0    0 1548

  Meat                     0  736    0

  Others                   0    0  280

  Seafood                  0   89    0

  Snack Foods              0 1989    0

  Soft Drinks            726    0    0

  Starchy Foods            0  269    0

python python-3.x pandas dataframe

edited Jan 3 at 3:57

Stephen Rauch

30k153758

asked Jan 3 at 3:56

datascientist110

205

edited Jan 3 at 3:57

Stephen Rauch

30k153758

asked Jan 3 at 3:56

datascientist110

205

edited Jan 3 at 3:57

Stephen Rauch

30k153758

edited Jan 3 at 3:57

Stephen Rauch

30k153758

edited Jan 3 at 3:57

Stephen Rauch

30k153758

asked Jan 3 at 3:56

datascientist110

205

asked Jan 3 at 3:56

datascientist110

205

asked Jan 3 at 3:56

datascientist110

205

1

Your sample output is unclear. And you should show what you have tried to give a clearer picture

– ycx
Jan 3 at 4:08

add a comment |

1

Your sample output is unclear. And you should show what you have tried to give a clearer picture

– ycx
Jan 3 at 4:08

Your sample output is unclear. And you should show what you have tried to give a clearer picture

– ycx
Jan 3 at 4:08

add a comment |

1 Answer
1

active

oldest

votes

Create a new columns which is sub-string of item Item_Identifier. and then create pivot_table based on them.

Here is the code. (assuming the df is the dataframe with dataset)

df['Item_Identifier_substr'] = df['Item_Identifier'].str.left(2)

pivot_df = df.pivot_table(index = 'Item_Type', columns = 'Item_Identifier_substr', values='Item_Identifier', aggfunc='count')



pivot_df

If you like it, pls vote my answer.

edited Jan 3 at 23:35

elPastor

2,86332142

answered Jan 3 at 5:11

Yong Wang

4613

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016152%2fhow-to-create-pivot-table-with-python-dataframe-based-on-columnss-substring-val%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Create a new columns which is sub-string of item Item_Identifier. and then create pivot_table based on them.

Here is the code. (assuming the df is the dataframe with dataset)

df['Item_Identifier_substr'] = df['Item_Identifier'].str.left(2)

pivot_df = df.pivot_table(index = 'Item_Type', columns = 'Item_Identifier_substr', values='Item_Identifier', aggfunc='count')



pivot_df

If you like it, pls vote my answer.

edited Jan 3 at 23:35

elPastor

2,86332142

answered Jan 3 at 5:11

Yong Wang

4613

add a comment |

Create a new columns which is sub-string of item Item_Identifier. and then create pivot_table based on them.

Here is the code. (assuming the df is the dataframe with dataset)

df['Item_Identifier_substr'] = df['Item_Identifier'].str.left(2)

pivot_df = df.pivot_table(index = 'Item_Type', columns = 'Item_Identifier_substr', values='Item_Identifier', aggfunc='count')



pivot_df

If you like it, pls vote my answer.

edited Jan 3 at 23:35

elPastor

2,86332142

answered Jan 3 at 5:11

Yong Wang

4613

add a comment |

Create a new columns which is sub-string of item Item_Identifier. and then create pivot_table based on them.

Here is the code. (assuming the df is the dataframe with dataset)

df['Item_Identifier_substr'] = df['Item_Identifier'].str.left(2)

pivot_df = df.pivot_table(index = 'Item_Type', columns = 'Item_Identifier_substr', values='Item_Identifier', aggfunc='count')



pivot_df

If you like it, pls vote my answer.

edited Jan 3 at 23:35

elPastor

2,86332142

answered Jan 3 at 5:11

Yong Wang

4613

Create a new columns which is sub-string of item Item_Identifier. and then create pivot_table based on them.

Here is the code. (assuming the df is the dataframe with dataset)

df['Item_Identifier_substr'] = df['Item_Identifier'].str.left(2)

pivot_df = df.pivot_table(index = 'Item_Type', columns = 'Item_Identifier_substr', values='Item_Identifier', aggfunc='count')



pivot_df

If you like it, pls vote my answer.

edited Jan 3 at 23:35

elPastor

2,86332142

answered Jan 3 at 5:11

Yong Wang

4613

edited Jan 3 at 23:35

elPastor

2,86332142

edited Jan 3 at 23:35

elPastor

2,86332142

edited Jan 3 at 23:35

elPastor

2,86332142

answered Jan 3 at 5:11

Yong Wang

4613

answered Jan 3 at 5:11

Yong Wang

4613

answered Jan 3 at 5:11

Yong Wang

4613

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

cUhUFBHz 5zYp5 DbqCCU2

搜尋此網誌

Bdtjtk