Calculate difference based on two columns in R





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







2















I have a little bit of a tricky question. Here is my data:



> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA,  NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,  3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))

seconds threat bins
1 689.00 NA 1
2 689.25 NA 1
3 689.50 NA 1
4 689.75 NA 1
5 690.00 NA 1
6 690.25 NA 2
7 690.50 1 2
8 690.75 1 2
9 691.00 0 2
10 691.25 0 2
11 691.50 1 3
12 691.75 NA 3
13 692.00 NA 3
14 692.25 1 3
15 692.50 1 3


Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:



  bin threat seconds
1 NA 1.25
1 1 0.00
1 0 0.00
2 NA 0.25
2 1 0.50
2 0 0.50
3 NA 0.50
3 1 0.75
3 0 0.00









share|improve this question





























    2















    I have a little bit of a tricky question. Here is my data:



    > structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA,  NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,  3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))

    seconds threat bins
    1 689.00 NA 1
    2 689.25 NA 1
    3 689.50 NA 1
    4 689.75 NA 1
    5 690.00 NA 1
    6 690.25 NA 2
    7 690.50 1 2
    8 690.75 1 2
    9 691.00 0 2
    10 691.25 0 2
    11 691.50 1 3
    12 691.75 NA 3
    13 692.00 NA 3
    14 692.25 1 3
    15 692.50 1 3


    Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:



      bin threat seconds
    1 NA 1.25
    1 1 0.00
    1 0 0.00
    2 NA 0.25
    2 1 0.50
    2 0 0.50
    3 NA 0.50
    3 1 0.75
    3 0 0.00









    share|improve this question

























      2












      2








      2








      I have a little bit of a tricky question. Here is my data:



      > structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA,  NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,  3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))

      seconds threat bins
      1 689.00 NA 1
      2 689.25 NA 1
      3 689.50 NA 1
      4 689.75 NA 1
      5 690.00 NA 1
      6 690.25 NA 2
      7 690.50 1 2
      8 690.75 1 2
      9 691.00 0 2
      10 691.25 0 2
      11 691.50 1 3
      12 691.75 NA 3
      13 692.00 NA 3
      14 692.25 1 3
      15 692.50 1 3


      Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:



        bin threat seconds
      1 NA 1.25
      1 1 0.00
      1 0 0.00
      2 NA 0.25
      2 1 0.50
      2 0 0.50
      3 NA 0.50
      3 1 0.75
      3 0 0.00









      share|improve this question














      I have a little bit of a tricky question. Here is my data:



      > structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA,  NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,  3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))

      seconds threat bins
      1 689.00 NA 1
      2 689.25 NA 1
      3 689.50 NA 1
      4 689.75 NA 1
      5 690.00 NA 1
      6 690.25 NA 2
      7 690.50 1 2
      8 690.75 1 2
      9 691.00 0 2
      10 691.25 0 2
      11 691.50 1 3
      12 691.75 NA 3
      13 692.00 NA 3
      14 692.25 1 3
      15 692.50 1 3


      Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:



        bin threat seconds
      1 NA 1.25
      1 1 0.00
      1 0 0.00
      2 NA 0.25
      2 1 0.50
      2 0 0.50
      3 NA 0.50
      3 1 0.75
      3 0 0.00






      r difference






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 4 at 0:12









      Mary SmirnovaMary Smirnova

      633




      633
























          1 Answer
          1






          active

          oldest

          votes


















          4














          Here's a tidyverse solution:



          df %>% arrange(seconds) %>% 
          mutate(duration = lead(seconds) - seconds) %>%
          complete(bins, threat, fill = list(duration = 0)) %>%
          group_by(bins, threat) %>%
          summarize(seconds = sum(duration, na.rm = TRUE))
          # A tibble: 9 x 3
          # Groups: bins [?]
          # bins threat seconds
          # <int> <int> <dbl>
          # 1 1 0 0
          # 2 1 1 0
          # 3 1 NA 1.25
          # 4 2 0 0.5
          # 5 2 1 0.5
          # 6 2 NA 0.25
          # 7 3 0 0
          # 8 3 1 0.5
          # 9 3 NA 0.5


          You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.



          So, first we arrange the data to be safe. Then due to the interactions between threat we define a new variable duration. Next we add new rows with duration == 0 for those (bins, threat) cases that are not yet present. Lastly we group by bins and threat and sum up the durations.






          share|improve this answer
























          • This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

            – thelatemail
            Jan 4 at 3:53











          • Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

            – thelatemail
            Jan 4 at 4:01











          • @thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

            – Julius Vainora
            Jan 4 at 10:30













          • I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

            – Mary Smirnova
            Jan 8 at 17:54













          • @MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

            – Julius Vainora
            Jan 8 at 18:01












          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54031556%2fcalculate-difference-based-on-two-columns-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4














          Here's a tidyverse solution:



          df %>% arrange(seconds) %>% 
          mutate(duration = lead(seconds) - seconds) %>%
          complete(bins, threat, fill = list(duration = 0)) %>%
          group_by(bins, threat) %>%
          summarize(seconds = sum(duration, na.rm = TRUE))
          # A tibble: 9 x 3
          # Groups: bins [?]
          # bins threat seconds
          # <int> <int> <dbl>
          # 1 1 0 0
          # 2 1 1 0
          # 3 1 NA 1.25
          # 4 2 0 0.5
          # 5 2 1 0.5
          # 6 2 NA 0.25
          # 7 3 0 0
          # 8 3 1 0.5
          # 9 3 NA 0.5


          You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.



          So, first we arrange the data to be safe. Then due to the interactions between threat we define a new variable duration. Next we add new rows with duration == 0 for those (bins, threat) cases that are not yet present. Lastly we group by bins and threat and sum up the durations.






          share|improve this answer
























          • This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

            – thelatemail
            Jan 4 at 3:53











          • Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

            – thelatemail
            Jan 4 at 4:01











          • @thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

            – Julius Vainora
            Jan 4 at 10:30













          • I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

            – Mary Smirnova
            Jan 8 at 17:54













          • @MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

            – Julius Vainora
            Jan 8 at 18:01
















          4














          Here's a tidyverse solution:



          df %>% arrange(seconds) %>% 
          mutate(duration = lead(seconds) - seconds) %>%
          complete(bins, threat, fill = list(duration = 0)) %>%
          group_by(bins, threat) %>%
          summarize(seconds = sum(duration, na.rm = TRUE))
          # A tibble: 9 x 3
          # Groups: bins [?]
          # bins threat seconds
          # <int> <int> <dbl>
          # 1 1 0 0
          # 2 1 1 0
          # 3 1 NA 1.25
          # 4 2 0 0.5
          # 5 2 1 0.5
          # 6 2 NA 0.25
          # 7 3 0 0
          # 8 3 1 0.5
          # 9 3 NA 0.5


          You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.



          So, first we arrange the data to be safe. Then due to the interactions between threat we define a new variable duration. Next we add new rows with duration == 0 for those (bins, threat) cases that are not yet present. Lastly we group by bins and threat and sum up the durations.






          share|improve this answer
























          • This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

            – thelatemail
            Jan 4 at 3:53











          • Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

            – thelatemail
            Jan 4 at 4:01











          • @thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

            – Julius Vainora
            Jan 4 at 10:30













          • I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

            – Mary Smirnova
            Jan 8 at 17:54













          • @MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

            – Julius Vainora
            Jan 8 at 18:01














          4












          4








          4







          Here's a tidyverse solution:



          df %>% arrange(seconds) %>% 
          mutate(duration = lead(seconds) - seconds) %>%
          complete(bins, threat, fill = list(duration = 0)) %>%
          group_by(bins, threat) %>%
          summarize(seconds = sum(duration, na.rm = TRUE))
          # A tibble: 9 x 3
          # Groups: bins [?]
          # bins threat seconds
          # <int> <int> <dbl>
          # 1 1 0 0
          # 2 1 1 0
          # 3 1 NA 1.25
          # 4 2 0 0.5
          # 5 2 1 0.5
          # 6 2 NA 0.25
          # 7 3 0 0
          # 8 3 1 0.5
          # 9 3 NA 0.5


          You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.



          So, first we arrange the data to be safe. Then due to the interactions between threat we define a new variable duration. Next we add new rows with duration == 0 for those (bins, threat) cases that are not yet present. Lastly we group by bins and threat and sum up the durations.






          share|improve this answer













          Here's a tidyverse solution:



          df %>% arrange(seconds) %>% 
          mutate(duration = lead(seconds) - seconds) %>%
          complete(bins, threat, fill = list(duration = 0)) %>%
          group_by(bins, threat) %>%
          summarize(seconds = sum(duration, na.rm = TRUE))
          # A tibble: 9 x 3
          # Groups: bins [?]
          # bins threat seconds
          # <int> <int> <dbl>
          # 1 1 0 0
          # 2 1 1 0
          # 3 1 NA 1.25
          # 4 2 0 0.5
          # 5 2 1 0.5
          # 6 2 NA 0.25
          # 7 3 0 0
          # 8 3 1 0.5
          # 9 3 NA 0.5


          You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.



          So, first we arrange the data to be safe. Then due to the interactions between threat we define a new variable duration. Next we add new rows with duration == 0 for those (bins, threat) cases that are not yet present. Lastly we group by bins and threat and sum up the durations.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 4 at 0:27









          Julius VainoraJulius Vainora

          38.5k76886




          38.5k76886













          • This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

            – thelatemail
            Jan 4 at 3:53











          • Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

            – thelatemail
            Jan 4 at 4:01











          • @thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

            – Julius Vainora
            Jan 4 at 10:30













          • I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

            – Mary Smirnova
            Jan 8 at 17:54













          • @MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

            – Julius Vainora
            Jan 8 at 18:01



















          • This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

            – thelatemail
            Jan 4 at 3:53











          • Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

            – thelatemail
            Jan 4 at 4:01











          • @thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

            – Julius Vainora
            Jan 4 at 10:30













          • I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

            – Mary Smirnova
            Jan 8 at 17:54













          • @MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

            – Julius Vainora
            Jan 8 at 18:01

















          This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

          – thelatemail
          Jan 4 at 3:53





          This has a different result than expected for bins=3 / threat=1 - 0.5 vs. 0.75. I think there's an issue when the group is split in the original sequence.

          – thelatemail
          Jan 4 at 3:53













          Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

          – thelatemail
          Jan 4 at 4:01





          Changing the second line to something like mutate(duration = c(0.25, diff(seconds))) seems to sort it out.

          – thelatemail
          Jan 4 at 4:01













          @thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

          – Julius Vainora
          Jan 4 at 10:30







          @thelatemail, right, there is an inconsistency since we don't know when the last "episode" of threat 1 bin 3 ends, which is the last row. I didn't want to assume that every consecutive difference of seconds is 0.25, so left it at that. Otherwise instead of summing we could be simply counting rows.

          – Julius Vainora
          Jan 4 at 10:30















          I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

          – Mary Smirnova
          Jan 8 at 17:54







          I am realizing that anything where the seconds are 0, it is not showing up at all. Is there a way to make sure they are included? I kept in the complete(bins, threat, fill = list(duration = 0)) and it still didn't keep them in. Do you know what might be happening?

          – Mary Smirnova
          Jan 8 at 17:54















          @MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

          – Julius Vainora
          Jan 8 at 18:01





          @MarySmirnova, I'm not sure I understand. Do you mean that with a different dataset you don't get rows such as 1 and 2 in my output?

          – Julius Vainora
          Jan 8 at 18:01




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54031556%2fcalculate-difference-based-on-two-columns-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Monofisismo

          Angular Downloading a file using contenturl with Basic Authentication

          Olmecas