Calculate difference based on two columns in R

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a little bit of a tricky question. Here is my data:

> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA,  NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,  3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))



   seconds threat bins

1   689.00     NA    1

2   689.25     NA    1

3   689.50     NA    1

4   689.75     NA    1

5   690.00     NA    1

6   690.25     NA    2

7   690.50      1    2

8   690.75      1    2

9   691.00      0    2

10  691.25      0    2

11  691.50      1    3

12  691.75     NA    3

13  692.00     NA    3

14  692.25      1    3

15  692.50      1    3

Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:

  bin threat seconds

   1     NA    1.25

   1      1    0.00

   1      0    0.00

   2     NA    0.25

   2      1    0.50

   2      0    0.50

   3     NA    0.50

   3      1    0.75

   3      0    0.00

asked Jan 4 at 0:12

Mary Smirnova

633

add a comment |

I have a little bit of a tricky question. Here is my data:

> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA,  NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,  3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))



   seconds threat bins

1   689.00     NA    1

2   689.25     NA    1

3   689.50     NA    1

4   689.75     NA    1

5   690.00     NA    1

6   690.25     NA    2

7   690.50      1    2

8   690.75      1    2

9   691.00      0    2

10  691.25      0    2

11  691.50      1    3

12  691.75     NA    3

13  692.00     NA    3

14  692.25      1    3

15  692.50      1    3

  bin threat seconds

   1     NA    1.25

   1      1    0.00

   1      0    0.00

   2     NA    0.25

   2      1    0.50

   2      0    0.50

   3     NA    0.50

   3      1    0.75

   3      0    0.00

asked Jan 4 at 0:12

Mary Smirnova

633

add a comment |

I have a little bit of a tricky question. Here is my data:

> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA,  NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,  3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))



   seconds threat bins

1   689.00     NA    1

2   689.25     NA    1

3   689.50     NA    1

4   689.75     NA    1

5   690.00     NA    1

6   690.25     NA    2

7   690.50      1    2

8   690.75      1    2

9   691.00      0    2

10  691.25      0    2

11  691.50      1    3

12  691.75     NA    3

13  692.00     NA    3

14  692.25      1    3

15  692.50      1    3

  bin threat seconds

   1     NA    1.25

   1      1    0.00

   1      0    0.00

   2     NA    0.25

   2      1    0.50

   2      0    0.50

   3     NA    0.50

   3      1    0.75

   3      0    0.00

asked Jan 4 at 0:12

Mary Smirnova

633

I have a little bit of a tricky question. Here is my data:

> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA,  NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,  3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))



   seconds threat bins

1   689.00     NA    1

2   689.25     NA    1

3   689.50     NA    1

4   689.75     NA    1

5   690.00     NA    1

6   690.25     NA    2

7   690.50      1    2

8   690.75      1    2

9   691.00      0    2

10  691.25      0    2

11  691.50      1    3

12  691.75     NA    3

13  692.00     NA    3

14  692.25      1    3

15  692.50      1    3

  bin threat seconds

   1     NA    1.25

   1      1    0.00

   1      0    0.00

   2     NA    0.25

   2      1    0.50

   2      0    0.50

   3     NA    0.50

   3      1    0.75

   3      0    0.00

r difference

asked Jan 4 at 0:12

Mary Smirnova

633

asked Jan 4 at 0:12

Mary Smirnova

633

asked Jan 4 at 0:12

Mary Smirnova

633

asked Jan 4 at 0:12

Mary Smirnova

633

asked Jan 4 at 0:12

Mary Smirnova

633

add a comment |

1 Answer
1

active

oldest

votes

Here's a tidyverse solution:

df %>% arrange(seconds) %>% 

  mutate(duration = lead(seconds) - seconds) %>% 

  complete(bins, threat, fill = list(duration = 0)) %>%

  group_by(bins, threat) %>% 

  summarize(seconds = sum(duration, na.rm = TRUE))

# A tibble: 9 x 3

# Groups:   bins [?]

#    bins threat seconds

#   <int>  <int>   <dbl>

# 1     1      0    0   

# 2     1      1    0   

# 3     1     NA    1.25

# 4     2      0    0.5 

# 5     2      1    0.5 

# 6     2     NA    0.25

# 7     3      0    0   

# 8     3      1    0.5 

# 9     3     NA    0.5

You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.

So, first we arrange the data to be safe. Then due to the interactions between threat we define a new variable duration. Next we add new rows with duration == 0 for those (bins, threat) cases that are not yet present. Lastly we group by bins and threat and sum up the durations.

answered Jan 4 at 0:27

Julius Vainora

38.5k76886

This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

– thelatemail
Jan 4 at 3:53

Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

– thelatemail
Jan 4 at 4:01

@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

– Julius Vainora
Jan 4 at 10:30

I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

– Mary Smirnova
Jan 8 at 17:54

@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

– Julius Vainora
Jan 8 at 18:01

|
show 4 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54031556%2fcalculate-difference-based-on-two-columns-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Here's a tidyverse solution:

df %>% arrange(seconds) %>% 

  mutate(duration = lead(seconds) - seconds) %>% 

  complete(bins, threat, fill = list(duration = 0)) %>%

  group_by(bins, threat) %>% 

  summarize(seconds = sum(duration, na.rm = TRUE))

# A tibble: 9 x 3

# Groups:   bins [?]

#    bins threat seconds

#   <int>  <int>   <dbl>

# 1     1      0    0   

# 2     1      1    0   

# 3     1     NA    1.25

# 4     2      0    0.5 

# 5     2      1    0.5 

# 6     2     NA    0.25

# 7     3      0    0   

# 8     3      1    0.5 

# 9     3     NA    0.5

You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.

answered Jan 4 at 0:27

Julius Vainora

38.5k76886

This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

– thelatemail
Jan 4 at 3:53

Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

– thelatemail
Jan 4 at 4:01

@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

– Julius Vainora
Jan 4 at 10:30

I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

– Mary Smirnova
Jan 8 at 17:54

@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

– Julius Vainora
Jan 8 at 18:01

|
show 4 more comments

Here's a tidyverse solution:

df %>% arrange(seconds) %>% 

  mutate(duration = lead(seconds) - seconds) %>% 

  complete(bins, threat, fill = list(duration = 0)) %>%

  group_by(bins, threat) %>% 

  summarize(seconds = sum(duration, na.rm = TRUE))

# A tibble: 9 x 3

# Groups:   bins [?]

#    bins threat seconds

#   <int>  <int>   <dbl>

# 1     1      0    0   

# 2     1      1    0   

# 3     1     NA    1.25

# 4     2      0    0.5 

# 5     2      1    0.5 

# 6     2     NA    0.25

# 7     3      0    0   

# 8     3      1    0.5 

# 9     3     NA    0.5

You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.

answered Jan 4 at 0:27

Julius Vainora

38.5k76886

This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

– thelatemail
Jan 4 at 3:53

Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

– thelatemail
Jan 4 at 4:01

@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

– Julius Vainora
Jan 4 at 10:30

I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

– Mary Smirnova
Jan 8 at 17:54

@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

– Julius Vainora
Jan 8 at 18:01

|
show 4 more comments

Here's a tidyverse solution:

df %>% arrange(seconds) %>% 

  mutate(duration = lead(seconds) - seconds) %>% 

  complete(bins, threat, fill = list(duration = 0)) %>%

  group_by(bins, threat) %>% 

  summarize(seconds = sum(duration, na.rm = TRUE))

# A tibble: 9 x 3

# Groups:   bins [?]

#    bins threat seconds

#   <int>  <int>   <dbl>

# 1     1      0    0   

# 2     1      1    0   

# 3     1     NA    1.25

# 4     2      0    0.5 

# 5     2      1    0.5 

# 6     2     NA    0.25

# 7     3      0    0   

# 8     3      1    0.5 

# 9     3     NA    0.5

You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.

answered Jan 4 at 0:27

Julius Vainora

38.5k76886

Here's a tidyverse solution:

df %>% arrange(seconds) %>% 

  mutate(duration = lead(seconds) - seconds) %>% 

  complete(bins, threat, fill = list(duration = 0)) %>%

  group_by(bins, threat) %>% 

  summarize(seconds = sum(duration, na.rm = TRUE))

# A tibble: 9 x 3

# Groups:   bins [?]

#    bins threat seconds

#   <int>  <int>   <dbl>

# 1     1      0    0   

# 2     1      1    0   

# 3     1     NA    1.25

# 4     2      0    0.5 

# 5     2      1    0.5 

# 6     2     NA    0.25

# 7     3      0    0   

# 8     3      1    0.5 

# 9     3     NA    0.5

You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.

answered Jan 4 at 0:27

Julius Vainora

38.5k76886

answered Jan 4 at 0:27

Julius Vainora

38.5k76886

answered Jan 4 at 0:27

Julius Vainora

38.5k76886

answered Jan 4 at 0:27

Julius Vainora

38.5k76886

This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

– thelatemail
Jan 4 at 3:53

Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

– thelatemail
Jan 4 at 4:01

@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

– Julius Vainora
Jan 4 at 10:30

I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

– Mary Smirnova
Jan 8 at 17:54

@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

– Julius Vainora
Jan 8 at 18:01

|
show 4 more comments

This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

– thelatemail
Jan 4 at 3:53

Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

– thelatemail
Jan 4 at 4:01

@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

– Julius Vainora
Jan 4 at 10:30

I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

– Mary Smirnova
Jan 8 at 17:54

@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

– Julius Vainora
Jan 8 at 18:01

This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

– thelatemail
Jan 4 at 3:53

Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

– thelatemail
Jan 4 at 4:01

@thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

– Julius Vainora
Jan 4 at 10:30

I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

– Mary Smirnova
Jan 8 at 17:54

@MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

– Julius Vainora
Jan 8 at 18:01

|
show 4 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk