Selecting groups in which one or more rows meet certain criteria












-1















I am cleaning up data in R using the tidyverse package. I would like to select all groups in which one or more rows meet a certain criterion.



I have a data that looks like the following:



require(tidyverse)
dat <- data_frame(
group = rep(c("A", "B", "C"),3),
key = c(1,1,0, 0,0,0,1,0,0),
value = rnorm(n= 9, mean = 3, sd = 1)
)

#A tibble: 9 x 3
#Groups: group [3]
group key value
<chr> <dbl> <dbl>
1 A 1 3.97
2 B 1 2.05
3 C 0 3.28
4 A 0 4.22
5 B 0 2.67
6 C 0 5.02
7 A 1 2.60
8 B 0 3.99
9 C 0 4.42


For this example, I would like to select groups in which one or more keys equal to 1. Only group A and B include rows whose key is 1. Hence, my expected results would be:



#A tibble: 9 x 3
#Groups: group [3]
group key value
<chr> <dbl> <dbl>
1 A 1 3.97
2 B 1 2.05
4 A 0 4.22
5 B 0 2.67
7 A 1 2.60
8 B 0 3.99









share|improve this question





























    -1















    I am cleaning up data in R using the tidyverse package. I would like to select all groups in which one or more rows meet a certain criterion.



    I have a data that looks like the following:



    require(tidyverse)
    dat <- data_frame(
    group = rep(c("A", "B", "C"),3),
    key = c(1,1,0, 0,0,0,1,0,0),
    value = rnorm(n= 9, mean = 3, sd = 1)
    )

    #A tibble: 9 x 3
    #Groups: group [3]
    group key value
    <chr> <dbl> <dbl>
    1 A 1 3.97
    2 B 1 2.05
    3 C 0 3.28
    4 A 0 4.22
    5 B 0 2.67
    6 C 0 5.02
    7 A 1 2.60
    8 B 0 3.99
    9 C 0 4.42


    For this example, I would like to select groups in which one or more keys equal to 1. Only group A and B include rows whose key is 1. Hence, my expected results would be:



    #A tibble: 9 x 3
    #Groups: group [3]
    group key value
    <chr> <dbl> <dbl>
    1 A 1 3.97
    2 B 1 2.05
    4 A 0 4.22
    5 B 0 2.67
    7 A 1 2.60
    8 B 0 3.99









    share|improve this question



























      -1












      -1








      -1








      I am cleaning up data in R using the tidyverse package. I would like to select all groups in which one or more rows meet a certain criterion.



      I have a data that looks like the following:



      require(tidyverse)
      dat <- data_frame(
      group = rep(c("A", "B", "C"),3),
      key = c(1,1,0, 0,0,0,1,0,0),
      value = rnorm(n= 9, mean = 3, sd = 1)
      )

      #A tibble: 9 x 3
      #Groups: group [3]
      group key value
      <chr> <dbl> <dbl>
      1 A 1 3.97
      2 B 1 2.05
      3 C 0 3.28
      4 A 0 4.22
      5 B 0 2.67
      6 C 0 5.02
      7 A 1 2.60
      8 B 0 3.99
      9 C 0 4.42


      For this example, I would like to select groups in which one or more keys equal to 1. Only group A and B include rows whose key is 1. Hence, my expected results would be:



      #A tibble: 9 x 3
      #Groups: group [3]
      group key value
      <chr> <dbl> <dbl>
      1 A 1 3.97
      2 B 1 2.05
      4 A 0 4.22
      5 B 0 2.67
      7 A 1 2.60
      8 B 0 3.99









      share|improve this question
















      I am cleaning up data in R using the tidyverse package. I would like to select all groups in which one or more rows meet a certain criterion.



      I have a data that looks like the following:



      require(tidyverse)
      dat <- data_frame(
      group = rep(c("A", "B", "C"),3),
      key = c(1,1,0, 0,0,0,1,0,0),
      value = rnorm(n= 9, mean = 3, sd = 1)
      )

      #A tibble: 9 x 3
      #Groups: group [3]
      group key value
      <chr> <dbl> <dbl>
      1 A 1 3.97
      2 B 1 2.05
      3 C 0 3.28
      4 A 0 4.22
      5 B 0 2.67
      6 C 0 5.02
      7 A 1 2.60
      8 B 0 3.99
      9 C 0 4.42


      For this example, I would like to select groups in which one or more keys equal to 1. Only group A and B include rows whose key is 1. Hence, my expected results would be:



      #A tibble: 9 x 3
      #Groups: group [3]
      group key value
      <chr> <dbl> <dbl>
      1 A 1 3.97
      2 B 1 2.05
      4 A 0 4.22
      5 B 0 2.67
      7 A 1 2.60
      8 B 0 3.99






      r dplyr






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 29 '18 at 6:32









      Ronak Shah

      35.2k103856




      35.2k103856










      asked Dec 29 '18 at 6:15









      user8460166user8460166

      498




      498
























          3 Answers
          3






          active

          oldest

          votes


















          2














          Relatively simple solutions is as follows:





          library(dplyr)

          set.seed(12345)

          dat <- data_frame(
          group = rep(c("A", "B", "C"),3),
          key = c(1,1,0, 0,0,0,1,0,0),
          value = rnorm(n= 9, mean = 3, sd = 1)
          )

          dat %>%
          group_by(group) %>%
          filter(sum(key == 1) > 0)

          #> # A tibble: 6 x 3
          #> # Groups: group [2]
          #> group key value
          #> <chr> <dbl> <dbl>
          #> 1 A 1 3.59
          #> 2 B 1 3.71
          #> 3 A 0 2.55
          #> 4 B 0 3.61
          #> 5 A 1 3.63
          #> 6 B 0 2.72


          Once you have grouped by a variable, you can apply a filter, remembering that any functions calling a variable will be applied to the vector of that variable belonging only to the group.






          share|improve this answer


























          • Thank you so much, @g_t_m!! Super quick and that's exactly what I was looking for. I'vs never thought of using sum there. Thanks a lot. :)

            – user8460166
            Dec 29 '18 at 6:24



















          1














          A base R option using ave would be



          dat[with(dat, ave(key == 1, group, FUN = function(x) any(sum(x) > 0))), ]

          # group key value
          # <chr> <dbl> <dbl>
          #1 A 1. 0.875
          #2 B 1. 2.61
          #3 A 0. 3.30
          #4 B 0. 1.40
          #5 A 1. 4.52
          #6 B 0. 3.34





          share|improve this answer































            1














            Here are some options.



            1) using data.table



            library(data.table)
            setDT(dat)[dat[, .I[sum(key == 1) > 0], group]$V1]
            # group key value
            #1: A 1 3.97
            #2: A 0 4.22
            #3: A 1 2.60
            #4: B 1 2.05
            #5: B 0 2.67
            #6: B 0 3.99




            2) with base R



            a) in a compact way with ave



            dat[!!with(dat, ave(key, group, FUN = max)), ]


            b) using table



            subset(dat, group %in% names(which(!!table(dat[1:2])[,2])))


            c) using rowsum



            subset(dat, group %in% names(which((rowsum(key, group) > 0) [, 1])))




            3) Using tidyverse



            library(tidyverse)
            dat %>%
            group_by(group) %>%
            filter(sum(key) > 0)


            data



            dat <- structure(list(group = c("A", "B", "C", "A", "B", "C", "A", "B", 
            "C"), key = c(1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), value = c(3.97,
            2.05, 3.28, 4.22, 2.67, 5.02, 2.6, 3.99, 4.42)), class = "data.frame",
            row.names = c("1",
            "2", "3", "4", "5", "6", "7", "8", "9"))





            share|improve this answer

























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53967191%2fselecting-groups-in-which-one-or-more-rows-meet-certain-criteria%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              Relatively simple solutions is as follows:





              library(dplyr)

              set.seed(12345)

              dat <- data_frame(
              group = rep(c("A", "B", "C"),3),
              key = c(1,1,0, 0,0,0,1,0,0),
              value = rnorm(n= 9, mean = 3, sd = 1)
              )

              dat %>%
              group_by(group) %>%
              filter(sum(key == 1) > 0)

              #> # A tibble: 6 x 3
              #> # Groups: group [2]
              #> group key value
              #> <chr> <dbl> <dbl>
              #> 1 A 1 3.59
              #> 2 B 1 3.71
              #> 3 A 0 2.55
              #> 4 B 0 3.61
              #> 5 A 1 3.63
              #> 6 B 0 2.72


              Once you have grouped by a variable, you can apply a filter, remembering that any functions calling a variable will be applied to the vector of that variable belonging only to the group.






              share|improve this answer


























              • Thank you so much, @g_t_m!! Super quick and that's exactly what I was looking for. I'vs never thought of using sum there. Thanks a lot. :)

                – user8460166
                Dec 29 '18 at 6:24
















              2














              Relatively simple solutions is as follows:





              library(dplyr)

              set.seed(12345)

              dat <- data_frame(
              group = rep(c("A", "B", "C"),3),
              key = c(1,1,0, 0,0,0,1,0,0),
              value = rnorm(n= 9, mean = 3, sd = 1)
              )

              dat %>%
              group_by(group) %>%
              filter(sum(key == 1) > 0)

              #> # A tibble: 6 x 3
              #> # Groups: group [2]
              #> group key value
              #> <chr> <dbl> <dbl>
              #> 1 A 1 3.59
              #> 2 B 1 3.71
              #> 3 A 0 2.55
              #> 4 B 0 3.61
              #> 5 A 1 3.63
              #> 6 B 0 2.72


              Once you have grouped by a variable, you can apply a filter, remembering that any functions calling a variable will be applied to the vector of that variable belonging only to the group.






              share|improve this answer


























              • Thank you so much, @g_t_m!! Super quick and that's exactly what I was looking for. I'vs never thought of using sum there. Thanks a lot. :)

                – user8460166
                Dec 29 '18 at 6:24














              2












              2








              2







              Relatively simple solutions is as follows:





              library(dplyr)

              set.seed(12345)

              dat <- data_frame(
              group = rep(c("A", "B", "C"),3),
              key = c(1,1,0, 0,0,0,1,0,0),
              value = rnorm(n= 9, mean = 3, sd = 1)
              )

              dat %>%
              group_by(group) %>%
              filter(sum(key == 1) > 0)

              #> # A tibble: 6 x 3
              #> # Groups: group [2]
              #> group key value
              #> <chr> <dbl> <dbl>
              #> 1 A 1 3.59
              #> 2 B 1 3.71
              #> 3 A 0 2.55
              #> 4 B 0 3.61
              #> 5 A 1 3.63
              #> 6 B 0 2.72


              Once you have grouped by a variable, you can apply a filter, remembering that any functions calling a variable will be applied to the vector of that variable belonging only to the group.






              share|improve this answer















              Relatively simple solutions is as follows:





              library(dplyr)

              set.seed(12345)

              dat <- data_frame(
              group = rep(c("A", "B", "C"),3),
              key = c(1,1,0, 0,0,0,1,0,0),
              value = rnorm(n= 9, mean = 3, sd = 1)
              )

              dat %>%
              group_by(group) %>%
              filter(sum(key == 1) > 0)

              #> # A tibble: 6 x 3
              #> # Groups: group [2]
              #> group key value
              #> <chr> <dbl> <dbl>
              #> 1 A 1 3.59
              #> 2 B 1 3.71
              #> 3 A 0 2.55
              #> 4 B 0 3.61
              #> 5 A 1 3.63
              #> 6 B 0 2.72


              Once you have grouped by a variable, you can apply a filter, remembering that any functions calling a variable will be applied to the vector of that variable belonging only to the group.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Dec 29 '18 at 6:24

























              answered Dec 29 '18 at 6:21









              g_t_mg_t_m

              1963




              1963













              • Thank you so much, @g_t_m!! Super quick and that's exactly what I was looking for. I'vs never thought of using sum there. Thanks a lot. :)

                – user8460166
                Dec 29 '18 at 6:24



















              • Thank you so much, @g_t_m!! Super quick and that's exactly what I was looking for. I'vs never thought of using sum there. Thanks a lot. :)

                – user8460166
                Dec 29 '18 at 6:24

















              Thank you so much, @g_t_m!! Super quick and that's exactly what I was looking for. I'vs never thought of using sum there. Thanks a lot. :)

              – user8460166
              Dec 29 '18 at 6:24





              Thank you so much, @g_t_m!! Super quick and that's exactly what I was looking for. I'vs never thought of using sum there. Thanks a lot. :)

              – user8460166
              Dec 29 '18 at 6:24













              1














              A base R option using ave would be



              dat[with(dat, ave(key == 1, group, FUN = function(x) any(sum(x) > 0))), ]

              # group key value
              # <chr> <dbl> <dbl>
              #1 A 1. 0.875
              #2 B 1. 2.61
              #3 A 0. 3.30
              #4 B 0. 1.40
              #5 A 1. 4.52
              #6 B 0. 3.34





              share|improve this answer




























                1














                A base R option using ave would be



                dat[with(dat, ave(key == 1, group, FUN = function(x) any(sum(x) > 0))), ]

                # group key value
                # <chr> <dbl> <dbl>
                #1 A 1. 0.875
                #2 B 1. 2.61
                #3 A 0. 3.30
                #4 B 0. 1.40
                #5 A 1. 4.52
                #6 B 0. 3.34





                share|improve this answer


























                  1












                  1








                  1







                  A base R option using ave would be



                  dat[with(dat, ave(key == 1, group, FUN = function(x) any(sum(x) > 0))), ]

                  # group key value
                  # <chr> <dbl> <dbl>
                  #1 A 1. 0.875
                  #2 B 1. 2.61
                  #3 A 0. 3.30
                  #4 B 0. 1.40
                  #5 A 1. 4.52
                  #6 B 0. 3.34





                  share|improve this answer













                  A base R option using ave would be



                  dat[with(dat, ave(key == 1, group, FUN = function(x) any(sum(x) > 0))), ]

                  # group key value
                  # <chr> <dbl> <dbl>
                  #1 A 1. 0.875
                  #2 B 1. 2.61
                  #3 A 0. 3.30
                  #4 B 0. 1.40
                  #5 A 1. 4.52
                  #6 B 0. 3.34






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 29 '18 at 6:27









                  Ronak ShahRonak Shah

                  35.2k103856




                  35.2k103856























                      1














                      Here are some options.



                      1) using data.table



                      library(data.table)
                      setDT(dat)[dat[, .I[sum(key == 1) > 0], group]$V1]
                      # group key value
                      #1: A 1 3.97
                      #2: A 0 4.22
                      #3: A 1 2.60
                      #4: B 1 2.05
                      #5: B 0 2.67
                      #6: B 0 3.99




                      2) with base R



                      a) in a compact way with ave



                      dat[!!with(dat, ave(key, group, FUN = max)), ]


                      b) using table



                      subset(dat, group %in% names(which(!!table(dat[1:2])[,2])))


                      c) using rowsum



                      subset(dat, group %in% names(which((rowsum(key, group) > 0) [, 1])))




                      3) Using tidyverse



                      library(tidyverse)
                      dat %>%
                      group_by(group) %>%
                      filter(sum(key) > 0)


                      data



                      dat <- structure(list(group = c("A", "B", "C", "A", "B", "C", "A", "B", 
                      "C"), key = c(1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), value = c(3.97,
                      2.05, 3.28, 4.22, 2.67, 5.02, 2.6, 3.99, 4.42)), class = "data.frame",
                      row.names = c("1",
                      "2", "3", "4", "5", "6", "7", "8", "9"))





                      share|improve this answer






























                        1














                        Here are some options.



                        1) using data.table



                        library(data.table)
                        setDT(dat)[dat[, .I[sum(key == 1) > 0], group]$V1]
                        # group key value
                        #1: A 1 3.97
                        #2: A 0 4.22
                        #3: A 1 2.60
                        #4: B 1 2.05
                        #5: B 0 2.67
                        #6: B 0 3.99




                        2) with base R



                        a) in a compact way with ave



                        dat[!!with(dat, ave(key, group, FUN = max)), ]


                        b) using table



                        subset(dat, group %in% names(which(!!table(dat[1:2])[,2])))


                        c) using rowsum



                        subset(dat, group %in% names(which((rowsum(key, group) > 0) [, 1])))




                        3) Using tidyverse



                        library(tidyverse)
                        dat %>%
                        group_by(group) %>%
                        filter(sum(key) > 0)


                        data



                        dat <- structure(list(group = c("A", "B", "C", "A", "B", "C", "A", "B", 
                        "C"), key = c(1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), value = c(3.97,
                        2.05, 3.28, 4.22, 2.67, 5.02, 2.6, 3.99, 4.42)), class = "data.frame",
                        row.names = c("1",
                        "2", "3", "4", "5", "6", "7", "8", "9"))





                        share|improve this answer




























                          1












                          1








                          1







                          Here are some options.



                          1) using data.table



                          library(data.table)
                          setDT(dat)[dat[, .I[sum(key == 1) > 0], group]$V1]
                          # group key value
                          #1: A 1 3.97
                          #2: A 0 4.22
                          #3: A 1 2.60
                          #4: B 1 2.05
                          #5: B 0 2.67
                          #6: B 0 3.99




                          2) with base R



                          a) in a compact way with ave



                          dat[!!with(dat, ave(key, group, FUN = max)), ]


                          b) using table



                          subset(dat, group %in% names(which(!!table(dat[1:2])[,2])))


                          c) using rowsum



                          subset(dat, group %in% names(which((rowsum(key, group) > 0) [, 1])))




                          3) Using tidyverse



                          library(tidyverse)
                          dat %>%
                          group_by(group) %>%
                          filter(sum(key) > 0)


                          data



                          dat <- structure(list(group = c("A", "B", "C", "A", "B", "C", "A", "B", 
                          "C"), key = c(1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), value = c(3.97,
                          2.05, 3.28, 4.22, 2.67, 5.02, 2.6, 3.99, 4.42)), class = "data.frame",
                          row.names = c("1",
                          "2", "3", "4", "5", "6", "7", "8", "9"))





                          share|improve this answer















                          Here are some options.



                          1) using data.table



                          library(data.table)
                          setDT(dat)[dat[, .I[sum(key == 1) > 0], group]$V1]
                          # group key value
                          #1: A 1 3.97
                          #2: A 0 4.22
                          #3: A 1 2.60
                          #4: B 1 2.05
                          #5: B 0 2.67
                          #6: B 0 3.99




                          2) with base R



                          a) in a compact way with ave



                          dat[!!with(dat, ave(key, group, FUN = max)), ]


                          b) using table



                          subset(dat, group %in% names(which(!!table(dat[1:2])[,2])))


                          c) using rowsum



                          subset(dat, group %in% names(which((rowsum(key, group) > 0) [, 1])))




                          3) Using tidyverse



                          library(tidyverse)
                          dat %>%
                          group_by(group) %>%
                          filter(sum(key) > 0)


                          data



                          dat <- structure(list(group = c("A", "B", "C", "A", "B", "C", "A", "B", 
                          "C"), key = c(1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), value = c(3.97,
                          2.05, 3.28, 4.22, 2.67, 5.02, 2.6, 3.99, 4.42)), class = "data.frame",
                          row.names = c("1",
                          "2", "3", "4", "5", "6", "7", "8", "9"))






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Dec 29 '18 at 9:35

























                          answered Dec 29 '18 at 9:11









                          akrunakrun

                          402k13193266




                          402k13193266






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53967191%2fselecting-groups-in-which-one-or-more-rows-meet-certain-criteria%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Monofisismo

                              Angular Downloading a file using contenturl with Basic Authentication

                              Olmecas