How to store the results of an iteration over rows of a Pandas DataFrame in a new column?












1















I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:



   Workflow Initiate   End_1   End_2   End_3
0 1 Name_1 na Name_1 na
1 2 Name_2 na na na
2 3 Name_3 na na Name_5
3 4 Name_4 Name_5 na na
4 5 na na na Name_5


For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.



Iterating through the rows in the following way gives me the desired output:



for index, row in df.iterrows():
if ((row['Initiate'] != 'na')
and (row['Initiate'] == row['End_1']) |
(row['Initiate'] == row['End_2']) |
(row['Initiate'] == row['End_3'])
):
print("Name end equals initiate")
elif ((row['End_1'] == 'na') &
(row['End_2'] == 'na') &
(row['End_3'] == 'na')
):
print("No name ended")
else:
print("Different name ended")

Name end equals initiate
No name ended
Different name ended
Different name ended
Different name ended


However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.



For this I stuffed the code into a function:



def function_name(a, b, c, d):
for index, row in df.iterrows():
if ((a != 'na')
and (a == b) |
(a == c) |
(a == d)
):
return "Name end equals initiate"
elif ((b == 'na') &
(c == 'na') &
(d == 'na')
):
return "No name ended"
else:
return "Different name ended"

df['Analysis'] = function_name(row['Initiate'],
row['End_1'],
row['End_2'],
row['End_3'])

print(df)

Workflow Initiate ... End_3 Analysis
0 1 Name_1 ... na Different name ended
1 2 Name_2 ... na Different name ended
2 3 Name_3 ... Name_5 Different name ended
3 4 Name_4 ... na Different name ended
4 5 na ... Name_5 Different name ended


As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.










share|improve this question



























    1















    I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:



       Workflow Initiate   End_1   End_2   End_3
    0 1 Name_1 na Name_1 na
    1 2 Name_2 na na na
    2 3 Name_3 na na Name_5
    3 4 Name_4 Name_5 na na
    4 5 na na na Name_5


    For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.



    Iterating through the rows in the following way gives me the desired output:



    for index, row in df.iterrows():
    if ((row['Initiate'] != 'na')
    and (row['Initiate'] == row['End_1']) |
    (row['Initiate'] == row['End_2']) |
    (row['Initiate'] == row['End_3'])
    ):
    print("Name end equals initiate")
    elif ((row['End_1'] == 'na') &
    (row['End_2'] == 'na') &
    (row['End_3'] == 'na')
    ):
    print("No name ended")
    else:
    print("Different name ended")

    Name end equals initiate
    No name ended
    Different name ended
    Different name ended
    Different name ended


    However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.



    For this I stuffed the code into a function:



    def function_name(a, b, c, d):
    for index, row in df.iterrows():
    if ((a != 'na')
    and (a == b) |
    (a == c) |
    (a == d)
    ):
    return "Name end equals initiate"
    elif ((b == 'na') &
    (c == 'na') &
    (d == 'na')
    ):
    return "No name ended"
    else:
    return "Different name ended"

    df['Analysis'] = function_name(row['Initiate'],
    row['End_1'],
    row['End_2'],
    row['End_3'])

    print(df)

    Workflow Initiate ... End_3 Analysis
    0 1 Name_1 ... na Different name ended
    1 2 Name_2 ... na Different name ended
    2 3 Name_3 ... Name_5 Different name ended
    3 4 Name_4 ... na Different name ended
    4 5 na ... Name_5 Different name ended


    As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.










    share|improve this question

























      1












      1








      1








      I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:



         Workflow Initiate   End_1   End_2   End_3
      0 1 Name_1 na Name_1 na
      1 2 Name_2 na na na
      2 3 Name_3 na na Name_5
      3 4 Name_4 Name_5 na na
      4 5 na na na Name_5


      For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.



      Iterating through the rows in the following way gives me the desired output:



      for index, row in df.iterrows():
      if ((row['Initiate'] != 'na')
      and (row['Initiate'] == row['End_1']) |
      (row['Initiate'] == row['End_2']) |
      (row['Initiate'] == row['End_3'])
      ):
      print("Name end equals initiate")
      elif ((row['End_1'] == 'na') &
      (row['End_2'] == 'na') &
      (row['End_3'] == 'na')
      ):
      print("No name ended")
      else:
      print("Different name ended")

      Name end equals initiate
      No name ended
      Different name ended
      Different name ended
      Different name ended


      However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.



      For this I stuffed the code into a function:



      def function_name(a, b, c, d):
      for index, row in df.iterrows():
      if ((a != 'na')
      and (a == b) |
      (a == c) |
      (a == d)
      ):
      return "Name end equals initiate"
      elif ((b == 'na') &
      (c == 'na') &
      (d == 'na')
      ):
      return "No name ended"
      else:
      return "Different name ended"

      df['Analysis'] = function_name(row['Initiate'],
      row['End_1'],
      row['End_2'],
      row['End_3'])

      print(df)

      Workflow Initiate ... End_3 Analysis
      0 1 Name_1 ... na Different name ended
      1 2 Name_2 ... na Different name ended
      2 3 Name_3 ... Name_5 Different name ended
      3 4 Name_4 ... na Different name ended
      4 5 na ... Name_5 Different name ended


      As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.










      share|improve this question














      I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:



         Workflow Initiate   End_1   End_2   End_3
      0 1 Name_1 na Name_1 na
      1 2 Name_2 na na na
      2 3 Name_3 na na Name_5
      3 4 Name_4 Name_5 na na
      4 5 na na na Name_5


      For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.



      Iterating through the rows in the following way gives me the desired output:



      for index, row in df.iterrows():
      if ((row['Initiate'] != 'na')
      and (row['Initiate'] == row['End_1']) |
      (row['Initiate'] == row['End_2']) |
      (row['Initiate'] == row['End_3'])
      ):
      print("Name end equals initiate")
      elif ((row['End_1'] == 'na') &
      (row['End_2'] == 'na') &
      (row['End_3'] == 'na')
      ):
      print("No name ended")
      else:
      print("Different name ended")

      Name end equals initiate
      No name ended
      Different name ended
      Different name ended
      Different name ended


      However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.



      For this I stuffed the code into a function:



      def function_name(a, b, c, d):
      for index, row in df.iterrows():
      if ((a != 'na')
      and (a == b) |
      (a == c) |
      (a == d)
      ):
      return "Name end equals initiate"
      elif ((b == 'na') &
      (c == 'na') &
      (d == 'na')
      ):
      return "No name ended"
      else:
      return "Different name ended"

      df['Analysis'] = function_name(row['Initiate'],
      row['End_1'],
      row['End_2'],
      row['End_3'])

      print(df)

      Workflow Initiate ... End_3 Analysis
      0 1 Name_1 ... na Different name ended
      1 2 Name_2 ... na Different name ended
      2 3 Name_3 ... Name_5 Different name ended
      3 4 Name_4 ... na Different name ended
      4 5 na ... Name_5 Different name ended


      As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.







      python pandas loops dataframe






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Dec 31 '18 at 13:37









      RensRens

      82




      82
























          2 Answers
          2






          active

          oldest

          votes


















          0














          You should avoid row-wise loops here. Your algorithm is vectorisable:



          df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing
          ends = df.filter(like='End') # filter by columns with 'End'

          match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
          nulls = ends.isnull().all(1) # check which rows are all null

          # apply vectorised conditional logic
          df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
          'Different name ended')

          print(df)

          Workflow Initiate End_1 End_2 End_3 Result
          0 1 Name_1 NaN Name_1 NaN Name end equals initiate
          1 2 Name_2 NaN NaN NaN No name ended
          2 3 Name_3 NaN NaN Name_5 Different name ended
          3 4 Name_4 Name_5 NaN NaN Different name ended
          4 5 NaN NaN NaN Name_5 Different name ended





          share|improve this answer
























          • Thanks for this answer, I managed to apply it to my case and seems to work.

            – Rens
            Feb 2 at 16:52











          • If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

            – Rens
            Feb 2 at 16:59













          • @Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

            – jpp
            Feb 2 at 17:01



















          0














          Try to use np.select()



          conditions = [
          (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
          (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
          ]

          answers = ['Name end equals initiate','No name ended']
          df['Analysis'] = np.select(conditions, answers, default='Different name ended')





          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988109%2fhow-to-store-the-results-of-an-iteration-over-rows-of-a-pandas-dataframe-in-a-ne%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            You should avoid row-wise loops here. Your algorithm is vectorisable:



            df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing
            ends = df.filter(like='End') # filter by columns with 'End'

            match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
            nulls = ends.isnull().all(1) # check which rows are all null

            # apply vectorised conditional logic
            df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
            'Different name ended')

            print(df)

            Workflow Initiate End_1 End_2 End_3 Result
            0 1 Name_1 NaN Name_1 NaN Name end equals initiate
            1 2 Name_2 NaN NaN NaN No name ended
            2 3 Name_3 NaN NaN Name_5 Different name ended
            3 4 Name_4 Name_5 NaN NaN Different name ended
            4 5 NaN NaN NaN Name_5 Different name ended





            share|improve this answer
























            • Thanks for this answer, I managed to apply it to my case and seems to work.

              – Rens
              Feb 2 at 16:52











            • If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

              – Rens
              Feb 2 at 16:59













            • @Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

              – jpp
              Feb 2 at 17:01
















            0














            You should avoid row-wise loops here. Your algorithm is vectorisable:



            df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing
            ends = df.filter(like='End') # filter by columns with 'End'

            match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
            nulls = ends.isnull().all(1) # check which rows are all null

            # apply vectorised conditional logic
            df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
            'Different name ended')

            print(df)

            Workflow Initiate End_1 End_2 End_3 Result
            0 1 Name_1 NaN Name_1 NaN Name end equals initiate
            1 2 Name_2 NaN NaN NaN No name ended
            2 3 Name_3 NaN NaN Name_5 Different name ended
            3 4 Name_4 Name_5 NaN NaN Different name ended
            4 5 NaN NaN NaN Name_5 Different name ended





            share|improve this answer
























            • Thanks for this answer, I managed to apply it to my case and seems to work.

              – Rens
              Feb 2 at 16:52











            • If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

              – Rens
              Feb 2 at 16:59













            • @Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

              – jpp
              Feb 2 at 17:01














            0












            0








            0







            You should avoid row-wise loops here. Your algorithm is vectorisable:



            df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing
            ends = df.filter(like='End') # filter by columns with 'End'

            match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
            nulls = ends.isnull().all(1) # check which rows are all null

            # apply vectorised conditional logic
            df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
            'Different name ended')

            print(df)

            Workflow Initiate End_1 End_2 End_3 Result
            0 1 Name_1 NaN Name_1 NaN Name end equals initiate
            1 2 Name_2 NaN NaN NaN No name ended
            2 3 Name_3 NaN NaN Name_5 Different name ended
            3 4 Name_4 Name_5 NaN NaN Different name ended
            4 5 NaN NaN NaN Name_5 Different name ended





            share|improve this answer













            You should avoid row-wise loops here. Your algorithm is vectorisable:



            df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing
            ends = df.filter(like='End') # filter by columns with 'End'

            match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
            nulls = ends.isnull().all(1) # check which rows are all null

            # apply vectorised conditional logic
            df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
            'Different name ended')

            print(df)

            Workflow Initiate End_1 End_2 End_3 Result
            0 1 Name_1 NaN Name_1 NaN Name end equals initiate
            1 2 Name_2 NaN NaN NaN No name ended
            2 3 Name_3 NaN NaN Name_5 Different name ended
            3 4 Name_4 Name_5 NaN NaN Different name ended
            4 5 NaN NaN NaN Name_5 Different name ended






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 31 '18 at 13:44









            jppjpp

            100k2161111




            100k2161111













            • Thanks for this answer, I managed to apply it to my case and seems to work.

              – Rens
              Feb 2 at 16:52











            • If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

              – Rens
              Feb 2 at 16:59













            • @Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

              – jpp
              Feb 2 at 17:01



















            • Thanks for this answer, I managed to apply it to my case and seems to work.

              – Rens
              Feb 2 at 16:52











            • If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

              – Rens
              Feb 2 at 16:59













            • @Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

              – jpp
              Feb 2 at 17:01

















            Thanks for this answer, I managed to apply it to my case and seems to work.

            – Rens
            Feb 2 at 16:52





            Thanks for this answer, I managed to apply it to my case and seems to work.

            – Rens
            Feb 2 at 16:52













            If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

            – Rens
            Feb 2 at 16:59







            If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?

            – Rens
            Feb 2 at 16:59















            @Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

            – jpp
            Feb 2 at 17:01





            @Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.

            – jpp
            Feb 2 at 17:01













            0














            Try to use np.select()



            conditions = [
            (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
            (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
            ]

            answers = ['Name end equals initiate','No name ended']
            df['Analysis'] = np.select(conditions, answers, default='Different name ended')





            share|improve this answer






























              0














              Try to use np.select()



              conditions = [
              (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
              (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
              ]

              answers = ['Name end equals initiate','No name ended']
              df['Analysis'] = np.select(conditions, answers, default='Different name ended')





              share|improve this answer




























                0












                0








                0







                Try to use np.select()



                conditions = [
                (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
                (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
                ]

                answers = ['Name end equals initiate','No name ended']
                df['Analysis'] = np.select(conditions, answers, default='Different name ended')





                share|improve this answer















                Try to use np.select()



                conditions = [
                (df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
                (df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
                ]

                answers = ['Name end equals initiate','No name ended']
                df['Analysis'] = np.select(conditions, answers, default='Different name ended')






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Dec 31 '18 at 14:02









                pygo

                3,0721619




                3,0721619










                answered Dec 31 '18 at 14:01









                corscors

                33228




                33228






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988109%2fhow-to-store-the-results-of-an-iteration-over-rows-of-a-pandas-dataframe-in-a-ne%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Monofisismo

                    Angular Downloading a file using contenturl with Basic Authentication

                    Olmecas