How to store the results of an iteration over rows of a Pandas DataFrame in a new column?
I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:
Workflow Initiate End_1 End_2 End_3
0 1 Name_1 na Name_1 na
1 2 Name_2 na na na
2 3 Name_3 na na Name_5
3 4 Name_4 Name_5 na na
4 5 na na na Name_5
For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.
Iterating through the rows in the following way gives me the desired output:
for index, row in df.iterrows():
if ((row['Initiate'] != 'na')
and (row['Initiate'] == row['End_1']) |
(row['Initiate'] == row['End_2']) |
(row['Initiate'] == row['End_3'])
):
print("Name end equals initiate")
elif ((row['End_1'] == 'na') &
(row['End_2'] == 'na') &
(row['End_3'] == 'na')
):
print("No name ended")
else:
print("Different name ended")
Name end equals initiate
No name ended
Different name ended
Different name ended
Different name ended
However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.
For this I stuffed the code into a function:
def function_name(a, b, c, d):
for index, row in df.iterrows():
if ((a != 'na')
and (a == b) |
(a == c) |
(a == d)
):
return "Name end equals initiate"
elif ((b == 'na') &
(c == 'na') &
(d == 'na')
):
return "No name ended"
else:
return "Different name ended"
df['Analysis'] = function_name(row['Initiate'],
row['End_1'],
row['End_2'],
row['End_3'])
print(df)
Workflow Initiate ... End_3 Analysis
0 1 Name_1 ... na Different name ended
1 2 Name_2 ... na Different name ended
2 3 Name_3 ... Name_5 Different name ended
3 4 Name_4 ... na Different name ended
4 5 na ... Name_5 Different name ended
As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.
python pandas loops dataframe
add a comment |
I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:
Workflow Initiate End_1 End_2 End_3
0 1 Name_1 na Name_1 na
1 2 Name_2 na na na
2 3 Name_3 na na Name_5
3 4 Name_4 Name_5 na na
4 5 na na na Name_5
For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.
Iterating through the rows in the following way gives me the desired output:
for index, row in df.iterrows():
if ((row['Initiate'] != 'na')
and (row['Initiate'] == row['End_1']) |
(row['Initiate'] == row['End_2']) |
(row['Initiate'] == row['End_3'])
):
print("Name end equals initiate")
elif ((row['End_1'] == 'na') &
(row['End_2'] == 'na') &
(row['End_3'] == 'na')
):
print("No name ended")
else:
print("Different name ended")
Name end equals initiate
No name ended
Different name ended
Different name ended
Different name ended
However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.
For this I stuffed the code into a function:
def function_name(a, b, c, d):
for index, row in df.iterrows():
if ((a != 'na')
and (a == b) |
(a == c) |
(a == d)
):
return "Name end equals initiate"
elif ((b == 'na') &
(c == 'na') &
(d == 'na')
):
return "No name ended"
else:
return "Different name ended"
df['Analysis'] = function_name(row['Initiate'],
row['End_1'],
row['End_2'],
row['End_3'])
print(df)
Workflow Initiate ... End_3 Analysis
0 1 Name_1 ... na Different name ended
1 2 Name_2 ... na Different name ended
2 3 Name_3 ... Name_5 Different name ended
3 4 Name_4 ... na Different name ended
4 5 na ... Name_5 Different name ended
As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.
python pandas loops dataframe
add a comment |
I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:
Workflow Initiate End_1 End_2 End_3
0 1 Name_1 na Name_1 na
1 2 Name_2 na na na
2 3 Name_3 na na Name_5
3 4 Name_4 Name_5 na na
4 5 na na na Name_5
For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.
Iterating through the rows in the following way gives me the desired output:
for index, row in df.iterrows():
if ((row['Initiate'] != 'na')
and (row['Initiate'] == row['End_1']) |
(row['Initiate'] == row['End_2']) |
(row['Initiate'] == row['End_3'])
):
print("Name end equals initiate")
elif ((row['End_1'] == 'na') &
(row['End_2'] == 'na') &
(row['End_3'] == 'na')
):
print("No name ended")
else:
print("Different name ended")
Name end equals initiate
No name ended
Different name ended
Different name ended
Different name ended
However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.
For this I stuffed the code into a function:
def function_name(a, b, c, d):
for index, row in df.iterrows():
if ((a != 'na')
and (a == b) |
(a == c) |
(a == d)
):
return "Name end equals initiate"
elif ((b == 'na') &
(c == 'na') &
(d == 'na')
):
return "No name ended"
else:
return "Different name ended"
df['Analysis'] = function_name(row['Initiate'],
row['End_1'],
row['End_2'],
row['End_3'])
print(df)
Workflow Initiate ... End_3 Analysis
0 1 Name_1 ... na Different name ended
1 2 Name_2 ... na Different name ended
2 3 Name_3 ... Name_5 Different name ended
3 4 Name_4 ... na Different name ended
4 5 na ... Name_5 Different name ended
As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.
python pandas loops dataframe
I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:
Workflow Initiate End_1 End_2 End_3
0 1 Name_1 na Name_1 na
1 2 Name_2 na na na
2 3 Name_3 na na Name_5
3 4 Name_4 Name_5 na na
4 5 na na na Name_5
For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.
Iterating through the rows in the following way gives me the desired output:
for index, row in df.iterrows():
if ((row['Initiate'] != 'na')
and (row['Initiate'] == row['End_1']) |
(row['Initiate'] == row['End_2']) |
(row['Initiate'] == row['End_3'])
):
print("Name end equals initiate")
elif ((row['End_1'] == 'na') &
(row['End_2'] == 'na') &
(row['End_3'] == 'na')
):
print("No name ended")
else:
print("Different name ended")
Name end equals initiate
No name ended
Different name ended
Different name ended
Different name ended
However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.
For this I stuffed the code into a function:
def function_name(a, b, c, d):
for index, row in df.iterrows():
if ((a != 'na')
and (a == b) |
(a == c) |
(a == d)
):
return "Name end equals initiate"
elif ((b == 'na') &
(c == 'na') &
(d == 'na')
):
return "No name ended"
else:
return "Different name ended"
df['Analysis'] = function_name(row['Initiate'],
row['End_1'],
row['End_2'],
row['End_3'])
print(df)
Workflow Initiate ... End_3 Analysis
0 1 Name_1 ... na Different name ended
1 2 Name_2 ... na Different name ended
2 3 Name_3 ... Name_5 Different name ended
3 4 Name_4 ... na Different name ended
4 5 na ... Name_5 Different name ended
As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.
python pandas loops dataframe
python pandas loops dataframe
asked Dec 31 '18 at 13:37
RensRens
82
82
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
You should avoid row-wise loops here. Your algorithm is vectorisable:
df = df.replace('na', np.nan) # replace string 'na' with NaN for efficient processing
ends = df.filter(like='End') # filter by columns with 'End'
match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
nulls = ends.isnull().all(1) # check which rows are all null
# apply vectorised conditional logic
df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
'Different name ended')
print(df)
Workflow Initiate End_1 End_2 End_3 Result
0 1 Name_1 NaN Name_1 NaN Name end equals initiate
1 2 Name_2 NaN NaN NaN No name ended
2 3 Name_3 NaN NaN Name_5 Different name ended
3 4 Name_4 Name_5 NaN NaN Different name ended
4 5 NaN NaN NaN Name_5 Different name ended
Thanks for this answer, I managed to apply it to my case and seems to work.
– Rens
Feb 2 at 16:52
If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?
– Rens
Feb 2 at 16:59
@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.
– jpp
Feb 2 at 17:01
add a comment |
Try to use np.select()
conditions = [
(df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
(df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
]
answers = ['Name end equals initiate','No name ended']
df['Analysis'] = np.select(conditions, answers, default='Different name ended')
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988109%2fhow-to-store-the-results-of-an-iteration-over-rows-of-a-pandas-dataframe-in-a-ne%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You should avoid row-wise loops here. Your algorithm is vectorisable:
df = df.replace('na', np.nan) # replace string 'na' with NaN for efficient processing
ends = df.filter(like='End') # filter by columns with 'End'
match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
nulls = ends.isnull().all(1) # check which rows are all null
# apply vectorised conditional logic
df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
'Different name ended')
print(df)
Workflow Initiate End_1 End_2 End_3 Result
0 1 Name_1 NaN Name_1 NaN Name end equals initiate
1 2 Name_2 NaN NaN NaN No name ended
2 3 Name_3 NaN NaN Name_5 Different name ended
3 4 Name_4 Name_5 NaN NaN Different name ended
4 5 NaN NaN NaN Name_5 Different name ended
Thanks for this answer, I managed to apply it to my case and seems to work.
– Rens
Feb 2 at 16:52
If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?
– Rens
Feb 2 at 16:59
@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.
– jpp
Feb 2 at 17:01
add a comment |
You should avoid row-wise loops here. Your algorithm is vectorisable:
df = df.replace('na', np.nan) # replace string 'na' with NaN for efficient processing
ends = df.filter(like='End') # filter by columns with 'End'
match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
nulls = ends.isnull().all(1) # check which rows are all null
# apply vectorised conditional logic
df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
'Different name ended')
print(df)
Workflow Initiate End_1 End_2 End_3 Result
0 1 Name_1 NaN Name_1 NaN Name end equals initiate
1 2 Name_2 NaN NaN NaN No name ended
2 3 Name_3 NaN NaN Name_5 Different name ended
3 4 Name_4 Name_5 NaN NaN Different name ended
4 5 NaN NaN NaN Name_5 Different name ended
Thanks for this answer, I managed to apply it to my case and seems to work.
– Rens
Feb 2 at 16:52
If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?
– Rens
Feb 2 at 16:59
@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.
– jpp
Feb 2 at 17:01
add a comment |
You should avoid row-wise loops here. Your algorithm is vectorisable:
df = df.replace('na', np.nan) # replace string 'na' with NaN for efficient processing
ends = df.filter(like='End') # filter by columns with 'End'
match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
nulls = ends.isnull().all(1) # check which rows are all null
# apply vectorised conditional logic
df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
'Different name ended')
print(df)
Workflow Initiate End_1 End_2 End_3 Result
0 1 Name_1 NaN Name_1 NaN Name end equals initiate
1 2 Name_2 NaN NaN NaN No name ended
2 3 Name_3 NaN NaN Name_5 Different name ended
3 4 Name_4 Name_5 NaN NaN Different name ended
4 5 NaN NaN NaN Name_5 Different name ended
You should avoid row-wise loops here. Your algorithm is vectorisable:
df = df.replace('na', np.nan) # replace string 'na' with NaN for efficient processing
ends = df.filter(like='End') # filter by columns with 'End'
match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
nulls = ends.isnull().all(1) # check which rows are all null
# apply vectorised conditional logic
df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
'Different name ended')
print(df)
Workflow Initiate End_1 End_2 End_3 Result
0 1 Name_1 NaN Name_1 NaN Name end equals initiate
1 2 Name_2 NaN NaN NaN No name ended
2 3 Name_3 NaN NaN Name_5 Different name ended
3 4 Name_4 Name_5 NaN NaN Different name ended
4 5 NaN NaN NaN Name_5 Different name ended
answered Dec 31 '18 at 13:44
jppjpp
100k2161111
100k2161111
Thanks for this answer, I managed to apply it to my case and seems to work.
– Rens
Feb 2 at 16:52
If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?
– Rens
Feb 2 at 16:59
@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.
– jpp
Feb 2 at 17:01
add a comment |
Thanks for this answer, I managed to apply it to my case and seems to work.
– Rens
Feb 2 at 16:52
If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?
– Rens
Feb 2 at 16:59
@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.
– jpp
Feb 2 at 17:01
Thanks for this answer, I managed to apply it to my case and seems to work.
– Rens
Feb 2 at 16:52
Thanks for this answer, I managed to apply it to my case and seems to work.
– Rens
Feb 2 at 16:52
If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?
– Rens
Feb 2 at 16:59
If I understand it correctly, you forward fill the rows with every name that is found in the ends. It could be the case that there are multiple different names in the ends. In that case, iloc only looks for matches between the initiate name and the last name found and not (one of) the names before. Is there a way to adjust this so that it looks for a match between the initiate name and all the different names in the ends?
– Rens
Feb 2 at 16:59
@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.
– jpp
Feb 2 at 17:01
@Rens, There is a way, but it's not a trivial change to the logic. If you like, do ask a new question, possibly referencing this one, and either I or another user can have a look.
– jpp
Feb 2 at 17:01
add a comment |
Try to use np.select()
conditions = [
(df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
(df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
]
answers = ['Name end equals initiate','No name ended']
df['Analysis'] = np.select(conditions, answers, default='Different name ended')
add a comment |
Try to use np.select()
conditions = [
(df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
(df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
]
answers = ['Name end equals initiate','No name ended']
df['Analysis'] = np.select(conditions, answers, default='Different name ended')
add a comment |
Try to use np.select()
conditions = [
(df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
(df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
]
answers = ['Name end equals initiate','No name ended']
df['Analysis'] = np.select(conditions, answers, default='Different name ended')
Try to use np.select()
conditions = [
(df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
(df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
]
answers = ['Name end equals initiate','No name ended']
df['Analysis'] = np.select(conditions, answers, default='Different name ended')
edited Dec 31 '18 at 14:02
pygo
3,0721619
3,0721619
answered Dec 31 '18 at 14:01
corscors
33228
33228
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988109%2fhow-to-store-the-results-of-an-iteration-over-rows-of-a-pandas-dataframe-in-a-ne%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown