How to store the results of an iteration over rows of a Pandas DataFrame in a new column?

I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:

   Workflow Initiate   End_1   End_2   End_3

0         1   Name_1      na  Name_1      na

1         2   Name_2      na      na      na

2         3   Name_3      na      na  Name_5

3         4   Name_4  Name_5      na      na

4         5       na      na      na  Name_5

For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.

Iterating through the rows in the following way gives me the desired output:

for index, row in df.iterrows():

    if ((row['Initiate'] != 'na')

        and (row['Initiate'] == row['End_1']) |

            (row['Initiate'] == row['End_2']) |

            (row['Initiate'] == row['End_3'])

        ):

        print("Name end equals initiate")

    elif ((row['End_1'] == 'na') &

          (row['End_2'] == 'na') &

          (row['End_3'] == 'na')

         ):

        print("No name ended")

    else:

        print("Different name ended")



Name end equals initiate

No name ended

Different name ended

Different name ended

Different name ended

However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.

For this I stuffed the code into a function:

def function_name(a, b, c, d):

    for index, row in df.iterrows():

        if ((a != 'na')

            and (a == b) |

                (a == c) |

                (a == d)

            ):

            return "Name end equals initiate"

        elif ((b == 'na') &

              (c == 'na') &

              (d == 'na')

             ):

            return "No name ended"

        else:

            return "Different name ended"



df['Analysis'] = function_name(row['Initiate'],

                               row['End_1'],

                               row['End_2'],

                               row['End_3'])



print(df)



   Workflow Initiate          ...            End_3              Analysis

0         1   Name_1          ...               na  Different name ended

1         2   Name_2          ...               na  Different name ended

2         3   Name_3          ...           Name_5  Different name ended

3         4   Name_4          ...               na  Different name ended

4         5       na          ...           Name_5  Different name ended

As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.

asked Dec 31 '18 at 13:37

Rens

add a comment |

   Workflow Initiate   End_1   End_2   End_3

0         1   Name_1      na  Name_1      na

1         2   Name_2      na      na      na

2         3   Name_3      na      na  Name_5

3         4   Name_4  Name_5      na      na

4         5       na      na      na  Name_5

For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.

Iterating through the rows in the following way gives me the desired output:

for index, row in df.iterrows():

    if ((row['Initiate'] != 'na')

        and (row['Initiate'] == row['End_1']) |

            (row['Initiate'] == row['End_2']) |

            (row['Initiate'] == row['End_3'])

        ):

        print("Name end equals initiate")

    elif ((row['End_1'] == 'na') &

          (row['End_2'] == 'na') &

          (row['End_3'] == 'na')

         ):

        print("No name ended")

    else:

        print("Different name ended")



Name end equals initiate

No name ended

Different name ended

Different name ended

Different name ended

However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.

For this I stuffed the code into a function:

def function_name(a, b, c, d):

    for index, row in df.iterrows():

        if ((a != 'na')

            and (a == b) |

                (a == c) |

                (a == d)

            ):

            return "Name end equals initiate"

        elif ((b == 'na') &

              (c == 'na') &

              (d == 'na')

             ):

            return "No name ended"

        else:

            return "Different name ended"



df['Analysis'] = function_name(row['Initiate'],

                               row['End_1'],

                               row['End_2'],

                               row['End_3'])



print(df)



   Workflow Initiate          ...            End_3              Analysis

0         1   Name_1          ...               na  Different name ended

1         2   Name_2          ...               na  Different name ended

2         3   Name_3          ...           Name_5  Different name ended

3         4   Name_4          ...               na  Different name ended

4         5       na          ...           Name_5  Different name ended

As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.

asked Dec 31 '18 at 13:37

Rens

add a comment |

   Workflow Initiate   End_1   End_2   End_3

0         1   Name_1      na  Name_1      na

1         2   Name_2      na      na      na

2         3   Name_3      na      na  Name_5

3         4   Name_4  Name_5      na      na

4         5       na      na      na  Name_5

For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.

Iterating through the rows in the following way gives me the desired output:

for index, row in df.iterrows():

    if ((row['Initiate'] != 'na')

        and (row['Initiate'] == row['End_1']) |

            (row['Initiate'] == row['End_2']) |

            (row['Initiate'] == row['End_3'])

        ):

        print("Name end equals initiate")

    elif ((row['End_1'] == 'na') &

          (row['End_2'] == 'na') &

          (row['End_3'] == 'na')

         ):

        print("No name ended")

    else:

        print("Different name ended")



Name end equals initiate

No name ended

Different name ended

Different name ended

Different name ended

However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.

For this I stuffed the code into a function:

def function_name(a, b, c, d):

    for index, row in df.iterrows():

        if ((a != 'na')

            and (a == b) |

                (a == c) |

                (a == d)

            ):

            return "Name end equals initiate"

        elif ((b == 'na') &

              (c == 'na') &

              (d == 'na')

             ):

            return "No name ended"

        else:

            return "Different name ended"



df['Analysis'] = function_name(row['Initiate'],

                               row['End_1'],

                               row['End_2'],

                               row['End_3'])



print(df)



   Workflow Initiate          ...            End_3              Analysis

0         1   Name_1          ...               na  Different name ended

1         2   Name_2          ...               na  Different name ended

2         3   Name_3          ...           Name_5  Different name ended

3         4   Name_4          ...               na  Different name ended

4         5       na          ...           Name_5  Different name ended

As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.

asked Dec 31 '18 at 13:37

Rens

   Workflow Initiate   End_1   End_2   End_3

0         1   Name_1      na  Name_1      na

1         2   Name_2      na      na      na

2         3   Name_3      na      na  Name_5

3         4   Name_4  Name_5      na      na

4         5       na      na      na  Name_5

For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.

Iterating through the rows in the following way gives me the desired output:

for index, row in df.iterrows():

    if ((row['Initiate'] != 'na')

        and (row['Initiate'] == row['End_1']) |

            (row['Initiate'] == row['End_2']) |

            (row['Initiate'] == row['End_3'])

        ):

        print("Name end equals initiate")

    elif ((row['End_1'] == 'na') &

          (row['End_2'] == 'na') &

          (row['End_3'] == 'na')

         ):

        print("No name ended")

    else:

        print("Different name ended")



Name end equals initiate

No name ended

Different name ended

Different name ended

Different name ended

However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.

For this I stuffed the code into a function:

def function_name(a, b, c, d):

    for index, row in df.iterrows():

        if ((a != 'na')

            and (a == b) |

                (a == c) |

                (a == d)

            ):

            return "Name end equals initiate"

        elif ((b == 'na') &

              (c == 'na') &

              (d == 'na')

             ):

            return "No name ended"

        else:

            return "Different name ended"



df['Analysis'] = function_name(row['Initiate'],

                               row['End_1'],

                               row['End_2'],

                               row['End_3'])



print(df)



   Workflow Initiate          ...            End_3              Analysis

0         1   Name_1          ...               na  Different name ended

1         2   Name_2          ...               na  Different name ended

2         3   Name_3          ...           Name_5  Different name ended

3         4   Name_4          ...               na  Different name ended

4         5       na          ...           Name_5  Different name ended

As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.

python pandas loops dataframe

asked Dec 31 '18 at 13:37

Rens

asked Dec 31 '18 at 13:37

Rens

asked Dec 31 '18 at 13:37

Rens

asked Dec 31 '18 at 13:37

Rens

asked Dec 31 '18 at 13:37

Rens

add a comment |

2 Answers
2

active

oldest

votes

You should avoid row-wise loops here. Your algorithm is vectorisable:

df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing

ends = df.filter(like='End')  # filter by columns with 'End'



match = ends.ffill(1).iloc[:, -1] == df['Initiate']  # find last Name in each End

nulls = ends.isnull().all(1)  # check which rows are all null



# apply vectorised conditional logic

df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],

                         'Different name ended')



print(df)



   Workflow Initiate   End_1   End_2   End_3                    Result

0         1   Name_1     NaN  Name_1     NaN  Name end equals initiate

1         2   Name_2     NaN     NaN     NaN             No name ended

2         3   Name_3     NaN     NaN  Name_5      Different name ended

3         4   Name_4  Name_5     NaN     NaN      Different name ended

4         5      NaN     NaN     NaN  Name_5      Different name ended

answered Dec 31 '18 at 13:44

jpp

100k2161111

Thanks for this answer, I managed to apply it to my case and seems to work.

– Rens
Feb 2 at 16:52

If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

– Rens
Feb 2 at 16:59

@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

– jpp
Feb 2 at 17:01

add a comment |

Try to use np.select()

conditions = [

    (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),

    (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')

] 



answers = ['Name end equals initiate','No name ended']   

df['Analysis'] = np.select(conditions, answers, default='Different name ended')

edited Dec 31 '18 at 14:02

pygo

3,0721619

answered Dec 31 '18 at 14:01

cors

33228

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988109%2fhow-to-store-the-results-of-an-iteration-over-rows-of-a-pandas-dataframe-in-a-ne%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You should avoid row-wise loops here. Your algorithm is vectorisable:

df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing

ends = df.filter(like='End')  # filter by columns with 'End'



match = ends.ffill(1).iloc[:, -1] == df['Initiate']  # find last Name in each End

nulls = ends.isnull().all(1)  # check which rows are all null



# apply vectorised conditional logic

df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],

                         'Different name ended')



print(df)



   Workflow Initiate   End_1   End_2   End_3                    Result

0         1   Name_1     NaN  Name_1     NaN  Name end equals initiate

1         2   Name_2     NaN     NaN     NaN             No name ended

2         3   Name_3     NaN     NaN  Name_5      Different name ended

3         4   Name_4  Name_5     NaN     NaN      Different name ended

4         5      NaN     NaN     NaN  Name_5      Different name ended

answered Dec 31 '18 at 13:44

jpp

100k2161111

Thanks for this answer, I managed to apply it to my case and seems to work.

– Rens
Feb 2 at 16:52

If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

– Rens
Feb 2 at 16:59

@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

– jpp
Feb 2 at 17:01

add a comment |

You should avoid row-wise loops here. Your algorithm is vectorisable:

df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing

ends = df.filter(like='End')  # filter by columns with 'End'



match = ends.ffill(1).iloc[:, -1] == df['Initiate']  # find last Name in each End

nulls = ends.isnull().all(1)  # check which rows are all null



# apply vectorised conditional logic

df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],

                         'Different name ended')



print(df)



   Workflow Initiate   End_1   End_2   End_3                    Result

0         1   Name_1     NaN  Name_1     NaN  Name end equals initiate

1         2   Name_2     NaN     NaN     NaN             No name ended

2         3   Name_3     NaN     NaN  Name_5      Different name ended

3         4   Name_4  Name_5     NaN     NaN      Different name ended

4         5      NaN     NaN     NaN  Name_5      Different name ended

answered Dec 31 '18 at 13:44

jpp

100k2161111

Thanks for this answer, I managed to apply it to my case and seems to work.

– Rens
Feb 2 at 16:52

If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

– Rens
Feb 2 at 16:59

@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

– jpp
Feb 2 at 17:01

add a comment |

You should avoid row-wise loops here. Your algorithm is vectorisable:

df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing

ends = df.filter(like='End')  # filter by columns with 'End'



match = ends.ffill(1).iloc[:, -1] == df['Initiate']  # find last Name in each End

nulls = ends.isnull().all(1)  # check which rows are all null



# apply vectorised conditional logic

df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],

                         'Different name ended')



print(df)



   Workflow Initiate   End_1   End_2   End_3                    Result

0         1   Name_1     NaN  Name_1     NaN  Name end equals initiate

1         2   Name_2     NaN     NaN     NaN             No name ended

2         3   Name_3     NaN     NaN  Name_5      Different name ended

3         4   Name_4  Name_5     NaN     NaN      Different name ended

4         5      NaN     NaN     NaN  Name_5      Different name ended

answered Dec 31 '18 at 13:44

jpp

100k2161111

You should avoid row-wise loops here. Your algorithm is vectorisable:

df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing

ends = df.filter(like='End')  # filter by columns with 'End'



match = ends.ffill(1).iloc[:, -1] == df['Initiate']  # find last Name in each End

nulls = ends.isnull().all(1)  # check which rows are all null



# apply vectorised conditional logic

df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],

                         'Different name ended')



print(df)



   Workflow Initiate   End_1   End_2   End_3                    Result

0         1   Name_1     NaN  Name_1     NaN  Name end equals initiate

1         2   Name_2     NaN     NaN     NaN             No name ended

2         3   Name_3     NaN     NaN  Name_5      Different name ended

3         4   Name_4  Name_5     NaN     NaN      Different name ended

4         5      NaN     NaN     NaN  Name_5      Different name ended

answered Dec 31 '18 at 13:44

jpp

100k2161111

answered Dec 31 '18 at 13:44

jpp

100k2161111

answered Dec 31 '18 at 13:44

jpp

100k2161111

answered Dec 31 '18 at 13:44

jpp

100k2161111

Thanks for this answer, I managed to apply it to my case and seems to work.

– Rens
Feb 2 at 16:52

If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

– Rens
Feb 2 at 16:59

@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

– jpp
Feb 2 at 17:01

add a comment |

Thanks for this answer, I managed to apply it to my case and seems to work.

– Rens
Feb 2 at 16:52

If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

– Rens
Feb 2 at 16:59

@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

– jpp
Feb 2 at 17:01

Thanks for this answer, I managed to apply it to my case and seems to work.

– Rens
Feb 2 at 16:52

If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

– Rens
Feb 2 at 16:59

@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

– jpp
Feb 2 at 17:01

add a comment |

Try to use np.select()

conditions = [

    (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),

    (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')

] 



answers = ['Name end equals initiate','No name ended']   

df['Analysis'] = np.select(conditions, answers, default='Different name ended')

edited Dec 31 '18 at 14:02

pygo

3,0721619

answered Dec 31 '18 at 14:01

cors

33228

add a comment |

Try to use np.select()

conditions = [

    (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),

    (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')

] 



answers = ['Name end equals initiate','No name ended']   

df['Analysis'] = np.select(conditions, answers, default='Different name ended')

edited Dec 31 '18 at 14:02

pygo

3,0721619

answered Dec 31 '18 at 14:01

cors

33228

add a comment |

Try to use np.select()

conditions = [

    (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),

    (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')

] 



answers = ['Name end equals initiate','No name ended']   

df['Analysis'] = np.select(conditions, answers, default='Different name ended')

edited Dec 31 '18 at 14:02

pygo

3,0721619

answered Dec 31 '18 at 14:01

cors

33228

Try to use np.select()

conditions = [

    (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),

    (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')

] 



answers = ['Name end equals initiate','No name ended']   

df['Analysis'] = np.select(conditions, answers, default='Different name ended')

edited Dec 31 '18 at 14:02

pygo

3,0721619

answered Dec 31 '18 at 14:01

cors

33228

edited Dec 31 '18 at 14:02

pygo

3,0721619

edited Dec 31 '18 at 14:02

pygo

3,0721619

edited Dec 31 '18 at 14:02

pygo

3,0721619

answered Dec 31 '18 at 14:01

cors

33228

answered Dec 31 '18 at 14:01

cors

33228

answered Dec 31 '18 at 14:01

cors

33228

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

R9rIf623qlP2ZgqMkapjBVaBu,PP3Q hHWLv6i,96wFlZbYmB,Fa 5B94EjNo,ddIj,6RcuJlFNWb yl Ly,E,g5DN1iAzlu,L7c3oLU

搜尋此網誌

Bdtjtk