Calculate difference based on two columns in R
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have a little bit of a tricky question. Here is my data:
> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA, NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))
seconds threat bins
1 689.00 NA 1
2 689.25 NA 1
3 689.50 NA 1
4 689.75 NA 1
5 690.00 NA 1
6 690.25 NA 2
7 690.50 1 2
8 690.75 1 2
9 691.00 0 2
10 691.25 0 2
11 691.50 1 3
12 691.75 NA 3
13 692.00 NA 3
14 692.25 1 3
15 692.50 1 3
Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:
bin threat seconds
1 NA 1.25
1 1 0.00
1 0 0.00
2 NA 0.25
2 1 0.50
2 0 0.50
3 NA 0.50
3 1 0.75
3 0 0.00
r difference
add a comment |
I have a little bit of a tricky question. Here is my data:
> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA, NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))
seconds threat bins
1 689.00 NA 1
2 689.25 NA 1
3 689.50 NA 1
4 689.75 NA 1
5 690.00 NA 1
6 690.25 NA 2
7 690.50 1 2
8 690.75 1 2
9 691.00 0 2
10 691.25 0 2
11 691.50 1 3
12 691.75 NA 3
13 692.00 NA 3
14 692.25 1 3
15 692.50 1 3
Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:
bin threat seconds
1 NA 1.25
1 1 0.00
1 0 0.00
2 NA 0.25
2 1 0.50
2 0 0.50
3 NA 0.50
3 1 0.75
3 0 0.00
r difference
add a comment |
I have a little bit of a tricky question. Here is my data:
> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA, NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))
seconds threat bins
1 689.00 NA 1
2 689.25 NA 1
3 689.50 NA 1
4 689.75 NA 1
5 690.00 NA 1
6 690.25 NA 2
7 690.50 1 2
8 690.75 1 2
9 691.00 0 2
10 691.25 0 2
11 691.50 1 3
12 691.75 NA 3
13 692.00 NA 3
14 692.25 1 3
15 692.50 1 3
Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:
bin threat seconds
1 NA 1.25
1 1 0.00
1 0 0.00
2 NA 0.25
2 1 0.50
2 0 0.50
3 NA 0.50
3 1 0.75
3 0 0.00
r difference
I have a little bit of a tricky question. Here is my data:
> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA, NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))
seconds threat bins
1 689.00 NA 1
2 689.25 NA 1
3 689.50 NA 1
4 689.75 NA 1
5 690.00 NA 1
6 690.25 NA 2
7 690.50 1 2
8 690.75 1 2
9 691.00 0 2
10 691.25 0 2
11 691.50 1 3
12 691.75 NA 3
13 692.00 NA 3
14 692.25 1 3
15 692.50 1 3
Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:
bin threat seconds
1 NA 1.25
1 1 0.00
1 0 0.00
2 NA 0.25
2 1 0.50
2 0 0.50
3 NA 0.50
3 1 0.75
3 0 0.00
r difference
r difference
asked Jan 4 at 0:12
Mary SmirnovaMary Smirnova
633
633
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Here's a tidyverse
solution:
df %>% arrange(seconds) %>%
mutate(duration = lead(seconds) - seconds) %>%
complete(bins, threat, fill = list(duration = 0)) %>%
group_by(bins, threat) %>%
summarize(seconds = sum(duration, na.rm = TRUE))
# A tibble: 9 x 3
# Groups: bins [?]
# bins threat seconds
# <int> <int> <dbl>
# 1 1 0 0
# 2 1 1 0
# 3 1 NA 1.25
# 4 2 0 0.5
# 5 2 1 0.5
# 6 2 NA 0.25
# 7 3 0 0
# 8 3 1 0.5
# 9 3 NA 0.5
You may erase complete(bins, threat, fill = list(duration = 0))
if adding rows where seconds
is 0 is not necessary.
So, first we arrange
the data to be safe. Then due to the interactions between threat
we define a new variable duration
. Next we add new rows with duration == 0
for those (bins
, threat
) cases that are not yet present. Lastly we group by bins
and threat
and sum up the durations.
This has a different result than expected forbins=3 / threat=1
- 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.
– thelatemail
Jan 4 at 3:53
Changing the second line to something likemutate(duration = c(0.25, diff(seconds)))
seems to sort it out.
– thelatemail
Jan 4 at 4:01
@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference ofseconds
is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.
– Julius Vainora
Jan 4 at 10:30
I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in thecomplete(bins, threat, fill = list(duration = 0))
and it still didn't keep them in. Do you know what might be happening?
– Mary Smirnova
Jan 8 at 17:54
@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?
– Julius Vainora
Jan 8 at 18:01
|
show 4 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54031556%2fcalculate-difference-based-on-two-columns-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here's a tidyverse
solution:
df %>% arrange(seconds) %>%
mutate(duration = lead(seconds) - seconds) %>%
complete(bins, threat, fill = list(duration = 0)) %>%
group_by(bins, threat) %>%
summarize(seconds = sum(duration, na.rm = TRUE))
# A tibble: 9 x 3
# Groups: bins [?]
# bins threat seconds
# <int> <int> <dbl>
# 1 1 0 0
# 2 1 1 0
# 3 1 NA 1.25
# 4 2 0 0.5
# 5 2 1 0.5
# 6 2 NA 0.25
# 7 3 0 0
# 8 3 1 0.5
# 9 3 NA 0.5
You may erase complete(bins, threat, fill = list(duration = 0))
if adding rows where seconds
is 0 is not necessary.
So, first we arrange
the data to be safe. Then due to the interactions between threat
we define a new variable duration
. Next we add new rows with duration == 0
for those (bins
, threat
) cases that are not yet present. Lastly we group by bins
and threat
and sum up the durations.
This has a different result than expected forbins=3 / threat=1
- 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.
– thelatemail
Jan 4 at 3:53
Changing the second line to something likemutate(duration = c(0.25, diff(seconds)))
seems to sort it out.
– thelatemail
Jan 4 at 4:01
@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference ofseconds
is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.
– Julius Vainora
Jan 4 at 10:30
I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in thecomplete(bins, threat, fill = list(duration = 0))
and it still didn't keep them in. Do you know what might be happening?
– Mary Smirnova
Jan 8 at 17:54
@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?
– Julius Vainora
Jan 8 at 18:01
|
show 4 more comments
Here's a tidyverse
solution:
df %>% arrange(seconds) %>%
mutate(duration = lead(seconds) - seconds) %>%
complete(bins, threat, fill = list(duration = 0)) %>%
group_by(bins, threat) %>%
summarize(seconds = sum(duration, na.rm = TRUE))
# A tibble: 9 x 3
# Groups: bins [?]
# bins threat seconds
# <int> <int> <dbl>
# 1 1 0 0
# 2 1 1 0
# 3 1 NA 1.25
# 4 2 0 0.5
# 5 2 1 0.5
# 6 2 NA 0.25
# 7 3 0 0
# 8 3 1 0.5
# 9 3 NA 0.5
You may erase complete(bins, threat, fill = list(duration = 0))
if adding rows where seconds
is 0 is not necessary.
So, first we arrange
the data to be safe. Then due to the interactions between threat
we define a new variable duration
. Next we add new rows with duration == 0
for those (bins
, threat
) cases that are not yet present. Lastly we group by bins
and threat
and sum up the durations.
This has a different result than expected forbins=3 / threat=1
- 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.
– thelatemail
Jan 4 at 3:53
Changing the second line to something likemutate(duration = c(0.25, diff(seconds)))
seems to sort it out.
– thelatemail
Jan 4 at 4:01
@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference ofseconds
is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.
– Julius Vainora
Jan 4 at 10:30
I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in thecomplete(bins, threat, fill = list(duration = 0))
and it still didn't keep them in. Do you know what might be happening?
– Mary Smirnova
Jan 8 at 17:54
@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?
– Julius Vainora
Jan 8 at 18:01
|
show 4 more comments
Here's a tidyverse
solution:
df %>% arrange(seconds) %>%
mutate(duration = lead(seconds) - seconds) %>%
complete(bins, threat, fill = list(duration = 0)) %>%
group_by(bins, threat) %>%
summarize(seconds = sum(duration, na.rm = TRUE))
# A tibble: 9 x 3
# Groups: bins [?]
# bins threat seconds
# <int> <int> <dbl>
# 1 1 0 0
# 2 1 1 0
# 3 1 NA 1.25
# 4 2 0 0.5
# 5 2 1 0.5
# 6 2 NA 0.25
# 7 3 0 0
# 8 3 1 0.5
# 9 3 NA 0.5
You may erase complete(bins, threat, fill = list(duration = 0))
if adding rows where seconds
is 0 is not necessary.
So, first we arrange
the data to be safe. Then due to the interactions between threat
we define a new variable duration
. Next we add new rows with duration == 0
for those (bins
, threat
) cases that are not yet present. Lastly we group by bins
and threat
and sum up the durations.
Here's a tidyverse
solution:
df %>% arrange(seconds) %>%
mutate(duration = lead(seconds) - seconds) %>%
complete(bins, threat, fill = list(duration = 0)) %>%
group_by(bins, threat) %>%
summarize(seconds = sum(duration, na.rm = TRUE))
# A tibble: 9 x 3
# Groups: bins [?]
# bins threat seconds
# <int> <int> <dbl>
# 1 1 0 0
# 2 1 1 0
# 3 1 NA 1.25
# 4 2 0 0.5
# 5 2 1 0.5
# 6 2 NA 0.25
# 7 3 0 0
# 8 3 1 0.5
# 9 3 NA 0.5
You may erase complete(bins, threat, fill = list(duration = 0))
if adding rows where seconds
is 0 is not necessary.
So, first we arrange
the data to be safe. Then due to the interactions between threat
we define a new variable duration
. Next we add new rows with duration == 0
for those (bins
, threat
) cases that are not yet present. Lastly we group by bins
and threat
and sum up the durations.
answered Jan 4 at 0:27
Julius VainoraJulius Vainora
38.5k76886
38.5k76886
This has a different result than expected forbins=3 / threat=1
- 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.
– thelatemail
Jan 4 at 3:53
Changing the second line to something likemutate(duration = c(0.25, diff(seconds)))
seems to sort it out.
– thelatemail
Jan 4 at 4:01
@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference ofseconds
is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.
– Julius Vainora
Jan 4 at 10:30
I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in thecomplete(bins, threat, fill = list(duration = 0))
and it still didn't keep them in. Do you know what might be happening?
– Mary Smirnova
Jan 8 at 17:54
@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?
– Julius Vainora
Jan 8 at 18:01
|
show 4 more comments
This has a different result than expected forbins=3 / threat=1
- 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.
– thelatemail
Jan 4 at 3:53
Changing the second line to something likemutate(duration = c(0.25, diff(seconds)))
seems to sort it out.
– thelatemail
Jan 4 at 4:01
@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference ofseconds
is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.
– Julius Vainora
Jan 4 at 10:30
I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in thecomplete(bins, threat, fill = list(duration = 0))
and it still didn't keep them in. Do you know what might be happening?
– Mary Smirnova
Jan 8 at 17:54
@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?
– Julius Vainora
Jan 8 at 18:01
This has a different result than expected for
bins=3 / threat=1
- 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.– thelatemail
Jan 4 at 3:53
This has a different result than expected for
bins=3 / threat=1
- 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.– thelatemail
Jan 4 at 3:53
Changing the second line to something like
mutate(duration = c(0.25, diff(seconds)))
seems to sort it out.– thelatemail
Jan 4 at 4:01
Changing the second line to something like
mutate(duration = c(0.25, diff(seconds)))
seems to sort it out.– thelatemail
Jan 4 at 4:01
@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of
seconds
is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.– Julius Vainora
Jan 4 at 10:30
@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of
seconds
is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.– Julius Vainora
Jan 4 at 10:30
I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the
complete(bins, threat, fill = list(duration = 0))
and it still didn't keep them in. Do you know what might be happening?– Mary Smirnova
Jan 8 at 17:54
I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the
complete(bins, threat, fill = list(duration = 0))
and it still didn't keep them in. Do you know what might be happening?– Mary Smirnova
Jan 8 at 17:54
@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?
– Julius Vainora
Jan 8 at 18:01
@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?
– Julius Vainora
Jan 8 at 18:01
|
show 4 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54031556%2fcalculate-difference-based-on-two-columns-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown