For each row return the column name of the largest value












64















I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier to return the department name, rather than the number of roster counts, from the frequency table. A simple example below (column names = departments, row names = employee ids).



DF <- matrix(sample(1:9,9),ncol=3,nrow=3)
DF <- as.data.frame.matrix(DF)
> DF
V1 V2 V3
1 2 7 9
2 8 3 6
3 1 5 4


Now how do I get



> DF2
RE
1 V3
2 V1
3 V2









share|improve this question

























  • how big is your actual data?

    – Arun
    Jul 18 '13 at 23:51






  • 1





    @Arun > dim(test) [1] 26746 18

    – dmvianna
    Jul 18 '13 at 23:57






  • 4





    An interesting generalization would be the largest n values' column names per row

    – Hack-R
    Sep 7 '16 at 0:07
















64















I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier to return the department name, rather than the number of roster counts, from the frequency table. A simple example below (column names = departments, row names = employee ids).



DF <- matrix(sample(1:9,9),ncol=3,nrow=3)
DF <- as.data.frame.matrix(DF)
> DF
V1 V2 V3
1 2 7 9
2 8 3 6
3 1 5 4


Now how do I get



> DF2
RE
1 V3
2 V1
3 V2









share|improve this question

























  • how big is your actual data?

    – Arun
    Jul 18 '13 at 23:51






  • 1





    @Arun > dim(test) [1] 26746 18

    – dmvianna
    Jul 18 '13 at 23:57






  • 4





    An interesting generalization would be the largest n values' column names per row

    – Hack-R
    Sep 7 '16 at 0:07














64












64








64


21






I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier to return the department name, rather than the number of roster counts, from the frequency table. A simple example below (column names = departments, row names = employee ids).



DF <- matrix(sample(1:9,9),ncol=3,nrow=3)
DF <- as.data.frame.matrix(DF)
> DF
V1 V2 V3
1 2 7 9
2 8 3 6
3 1 5 4


Now how do I get



> DF2
RE
1 V3
2 V1
3 V2









share|improve this question
















I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier to return the department name, rather than the number of roster counts, from the frequency table. A simple example below (column names = departments, row names = employee ids).



DF <- matrix(sample(1:9,9),ncol=3,nrow=3)
DF <- as.data.frame.matrix(DF)
> DF
V1 V2 V3
1 2 7 9
2 8 3 6
3 1 5 4


Now how do I get



> DF2
RE
1 V3
2 V1
3 V2






r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 14 '17 at 13:17









zx8754

29.8k76399




29.8k76399










asked Jul 18 '13 at 23:45









dmviannadmvianna

6,704125789




6,704125789













  • how big is your actual data?

    – Arun
    Jul 18 '13 at 23:51






  • 1





    @Arun > dim(test) [1] 26746 18

    – dmvianna
    Jul 18 '13 at 23:57






  • 4





    An interesting generalization would be the largest n values' column names per row

    – Hack-R
    Sep 7 '16 at 0:07



















  • how big is your actual data?

    – Arun
    Jul 18 '13 at 23:51






  • 1





    @Arun > dim(test) [1] 26746 18

    – dmvianna
    Jul 18 '13 at 23:57






  • 4





    An interesting generalization would be the largest n values' column names per row

    – Hack-R
    Sep 7 '16 at 0:07

















how big is your actual data?

– Arun
Jul 18 '13 at 23:51





how big is your actual data?

– Arun
Jul 18 '13 at 23:51




1




1





@Arun > dim(test) [1] 26746 18

– dmvianna
Jul 18 '13 at 23:57





@Arun > dim(test) [1] 26746 18

– dmvianna
Jul 18 '13 at 23:57




4




4





An interesting generalization would be the largest n values' column names per row

– Hack-R
Sep 7 '16 at 0:07





An interesting generalization would be the largest n values' column names per row

– Hack-R
Sep 7 '16 at 0:07












6 Answers
6






active

oldest

votes


















65














One option using your data (for future reference, use set.seed() to make examples using sample reproducible):



DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

colnames(DF)[apply(DF,1,which.max)]
[1] "V3" "V1" "V2"


A faster solution than using apply might be max.col:



colnames(DF)[max.col(DF,ties.method="first")]
#[1] "V3" "V1" "V2"


...where ties.method can be any of "random" "first" or "last"



This of course causes issues if you happen to have two columns which are equal to the maximum. I'm not sure what you want to do in that instance as you will have more than one result for some rows. E.g.:



DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))
apply(DF,1,function(x) which(x==max(x)))

[[1]]
V2 V3
2 3

[[2]]
V1
1

[[3]]
V2
2





share|improve this answer


























  • If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

    – dmvianna
    Jul 18 '13 at 23:59











  • @dmvianna - using which.max will be fine then.

    – thelatemail
    Jul 19 '13 at 0:03











  • I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

    – dmvianna
    Jul 19 '13 at 0:05











  • apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

    – Arun
    Jul 19 '13 at 0:07






  • 1





    @PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

    – thelatemail
    Apr 19 '16 at 22:14



















12














If you're interested in a data.table solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!



Here I've generated data of your dimensions (26746 * 18).



Data



set.seed(45)
DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))



data.table answer:



require(data.table)
DT <- data.table(value=unlist(DF, use.names=FALSE),
colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]


Benchmarking:



# data.table solution
system.time({
DT <- data.table(value=unlist(DF, use.names=FALSE),
colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
})
# user system elapsed
# 0.174 0.029 0.227

# apply solution from @thelatemail
system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])
# user system elapsed
# 2.322 0.036 2.602

identical(t1, t2)
# [1] TRUE


It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.





Edit: if any of the max ids is okay, then:



DT <- data.table(value=unlist(DF, use.names=FALSE), 
colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid)), rowid, mult="last"]





share|improve this answer


























  • I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

    – dmvianna
    Jul 19 '13 at 0:48



















5














Based on the above suggestions, the following data.table solution worked very fast for me:



library(data.table)

set.seed(45)
DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))

system.time(
DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]
)
#> user system elapsed
#> 0.15 0.06 0.21
DT
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max
#> 1: 7 4 1 2 3 7 6 6 6 1 V1
#> 2: 4 6 9 10 6 2 7 7 1 3 V4
#> 3: 3 4 9 8 9 9 8 8 6 7 V3
#> 4: 4 8 8 9 7 5 9 2 7 1 V4
#> 5: 4 3 9 10 2 7 9 6 6 9 V4
#> ---
#> 999996: 4 6 10 5 4 7 3 8 2 8 V3
#> 999997: 8 7 6 6 3 10 2 3 10 1 V6
#> 999998: 2 3 2 7 4 7 5 2 7 3 V4
#> 999999: 8 10 3 2 3 4 5 1 1 4 V2
#> 1000000: 10 4 2 6 6 2 8 4 7 4 V1


And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:



DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]




In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:



DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]





share|improve this answer


























  • I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

    – user1412
    Feb 26 '17 at 18:38











  • Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

    – Valentin
    Feb 28 '17 at 10:09






  • 1





    A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

    – lwshang
    Mar 9 '18 at 14:08





















3














One solution could be to reshape the date from wide to long putting all the departments in one column and counts in another, group by the employer id (in this case, the row number), and then filter to the department(s) with the max value. There are a couple of options for handling ties with this approach too.



library(tidyverse)

# sample data frame with a tie
df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))

# If you aren't worried about ties:
df %>%
rownames_to_column('id') %>% # creates an ID number
gather(dept, cnt, V1:V3) %>%
group_by(id) %>%
slice(which.max(cnt))

# A tibble: 3 x 3
# Groups: id [3]
id dept cnt
<chr> <chr> <dbl>
1 1 V3 9.
2 2 V1 8.
3 3 V2 5.


# If you're worried about keeping ties:
df %>%
rownames_to_column('id') %>%
gather(dept, cnt, V1:V3) %>%
group_by(id) %>%
filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works
arrange(id)

# A tibble: 4 x 3
# Groups: id [3]
id dept cnt
<chr> <chr> <dbl>
1 1 V3 9.
2 2 V1 8.
3 3 V2 5.
4 3 V3 5.


# If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'
df %>%
rownames_to_column('id') %>%
gather(dept, cnt, V1:V3) %>%
group_by(id) %>%
mutate(dept_rank = rank(-cnt, ties.method = "first")) %>% # or 'last'
filter(dept_rank == 1) %>%
select(-dept_rank)

# A tibble: 3 x 3
# Groups: id [3]
id dept cnt
<chr> <chr> <dbl>
1 2 V1 8.
2 3 V2 5.
3 1 V3 9.

# if you wanted to keep the original wide data frame
df %>%
rownames_to_column('id') %>%
left_join(
df %>%
rownames_to_column('id') %>%
gather(max_dept, max_cnt, V1:V3) %>%
group_by(id) %>%
slice(which.max(max_cnt)),
by = 'id'
)

# A tibble: 3 x 6
id V1 V2 V3 max_dept max_cnt
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 2. 7. 9. V3 9.
2 2 8. 3. 6. V1 8.
3 3 1. 5. 5. V2 5.





share|improve this answer

































    1














    A simple for loop can also be handy:



    > df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
    > df
    V1 V2 V3
    1 2 7 9
    2 8 3 6
    3 1 5 4
    > df2<-data.frame()
    > for (i in 1:nrow(df)){
    + df2[i,1]<-colnames(df[which.max(df[i,])])
    + }
    > df2
    V1
    1 V3
    2 V1
    3 V2





    share|improve this answer

































      1














      A dplyr solution:



      Idea:




      • add rowids as a column

      • reshape to long format

      • filter for max in each group


      Code:



      DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
      DF %>%
      rownames_to_column() %>%
      gather(column, value, -rowname) %>%
      group_by(rowname) %>%
      filter(rank(-value) == 1)


      Result:



      # A tibble: 3 x 3
      # Groups: rowname [3]
      rowname column value
      <chr> <chr> <dbl>
      1 2 V1 8
      2 3 V2 5
      3 1 V3 9


      This approach can be easily extended to get the top n columns.
      Example for n=2:



      DF %>% 
      rownames_to_column() %>%
      gather(column, value, -rowname) %>%
      group_by(rowname) %>%
      mutate(rk = rank(-value)) %>%
      filter(rk <= 2) %>%
      arrange(rowname, rk)


      Result:



      # A tibble: 6 x 4
      # Groups: rowname [3]
      rowname column value rk
      <chr> <chr> <dbl> <dbl>
      1 1 V3 9 1
      2 1 V2 7 2
      3 2 V1 8 1
      4 2 V3 6 2
      5 3 V2 5 1
      6 3 V3 4 2





      share|improve this answer
























      • Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

        – Gregor
        Nov 6 '18 at 14:46










      protected by zx8754 Jul 14 '17 at 13:16



      Thank you for your interest in this question.
      Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



      Would you like to answer one of these unanswered questions instead?














      6 Answers
      6






      active

      oldest

      votes








      6 Answers
      6






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      65














      One option using your data (for future reference, use set.seed() to make examples using sample reproducible):



      DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

      colnames(DF)[apply(DF,1,which.max)]
      [1] "V3" "V1" "V2"


      A faster solution than using apply might be max.col:



      colnames(DF)[max.col(DF,ties.method="first")]
      #[1] "V3" "V1" "V2"


      ...where ties.method can be any of "random" "first" or "last"



      This of course causes issues if you happen to have two columns which are equal to the maximum. I'm not sure what you want to do in that instance as you will have more than one result for some rows. E.g.:



      DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))
      apply(DF,1,function(x) which(x==max(x)))

      [[1]]
      V2 V3
      2 3

      [[2]]
      V1
      1

      [[3]]
      V2
      2





      share|improve this answer


























      • If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

        – dmvianna
        Jul 18 '13 at 23:59











      • @dmvianna - using which.max will be fine then.

        – thelatemail
        Jul 19 '13 at 0:03











      • I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

        – dmvianna
        Jul 19 '13 at 0:05











      • apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

        – Arun
        Jul 19 '13 at 0:07






      • 1





        @PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

        – thelatemail
        Apr 19 '16 at 22:14
















      65














      One option using your data (for future reference, use set.seed() to make examples using sample reproducible):



      DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

      colnames(DF)[apply(DF,1,which.max)]
      [1] "V3" "V1" "V2"


      A faster solution than using apply might be max.col:



      colnames(DF)[max.col(DF,ties.method="first")]
      #[1] "V3" "V1" "V2"


      ...where ties.method can be any of "random" "first" or "last"



      This of course causes issues if you happen to have two columns which are equal to the maximum. I'm not sure what you want to do in that instance as you will have more than one result for some rows. E.g.:



      DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))
      apply(DF,1,function(x) which(x==max(x)))

      [[1]]
      V2 V3
      2 3

      [[2]]
      V1
      1

      [[3]]
      V2
      2





      share|improve this answer


























      • If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

        – dmvianna
        Jul 18 '13 at 23:59











      • @dmvianna - using which.max will be fine then.

        – thelatemail
        Jul 19 '13 at 0:03











      • I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

        – dmvianna
        Jul 19 '13 at 0:05











      • apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

        – Arun
        Jul 19 '13 at 0:07






      • 1





        @PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

        – thelatemail
        Apr 19 '16 at 22:14














      65












      65








      65







      One option using your data (for future reference, use set.seed() to make examples using sample reproducible):



      DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

      colnames(DF)[apply(DF,1,which.max)]
      [1] "V3" "V1" "V2"


      A faster solution than using apply might be max.col:



      colnames(DF)[max.col(DF,ties.method="first")]
      #[1] "V3" "V1" "V2"


      ...where ties.method can be any of "random" "first" or "last"



      This of course causes issues if you happen to have two columns which are equal to the maximum. I'm not sure what you want to do in that instance as you will have more than one result for some rows. E.g.:



      DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))
      apply(DF,1,function(x) which(x==max(x)))

      [[1]]
      V2 V3
      2 3

      [[2]]
      V1
      1

      [[3]]
      V2
      2





      share|improve this answer















      One option using your data (for future reference, use set.seed() to make examples using sample reproducible):



      DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

      colnames(DF)[apply(DF,1,which.max)]
      [1] "V3" "V1" "V2"


      A faster solution than using apply might be max.col:



      colnames(DF)[max.col(DF,ties.method="first")]
      #[1] "V3" "V1" "V2"


      ...where ties.method can be any of "random" "first" or "last"



      This of course causes issues if you happen to have two columns which are equal to the maximum. I'm not sure what you want to do in that instance as you will have more than one result for some rows. E.g.:



      DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))
      apply(DF,1,function(x) which(x==max(x)))

      [[1]]
      V2 V3
      2 3

      [[2]]
      V1
      1

      [[3]]
      V2
      2






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Sep 30 '14 at 22:49

























      answered Jul 18 '13 at 23:49









      thelatemailthelatemail

      67.4k882150




      67.4k882150













      • If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

        – dmvianna
        Jul 18 '13 at 23:59











      • @dmvianna - using which.max will be fine then.

        – thelatemail
        Jul 19 '13 at 0:03











      • I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

        – dmvianna
        Jul 19 '13 at 0:05











      • apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

        – Arun
        Jul 19 '13 at 0:07






      • 1





        @PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

        – thelatemail
        Apr 19 '16 at 22:14



















      • If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

        – dmvianna
        Jul 18 '13 at 23:59











      • @dmvianna - using which.max will be fine then.

        – thelatemail
        Jul 19 '13 at 0:03











      • I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

        – dmvianna
        Jul 19 '13 at 0:05











      • apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

        – Arun
        Jul 19 '13 at 0:07






      • 1





        @PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

        – thelatemail
        Apr 19 '16 at 22:14

















      If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

      – dmvianna
      Jul 18 '13 at 23:59





      If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

      – dmvianna
      Jul 18 '13 at 23:59













      @dmvianna - using which.max will be fine then.

      – thelatemail
      Jul 19 '13 at 0:03





      @dmvianna - using which.max will be fine then.

      – thelatemail
      Jul 19 '13 at 0:03













      I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

      – dmvianna
      Jul 19 '13 at 0:05





      I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

      – dmvianna
      Jul 19 '13 at 0:05













      apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

      – Arun
      Jul 19 '13 at 0:07





      apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

      – Arun
      Jul 19 '13 at 0:07




      1




      1





      @PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

      – thelatemail
      Apr 19 '16 at 22:14





      @PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

      – thelatemail
      Apr 19 '16 at 22:14













      12














      If you're interested in a data.table solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!



      Here I've generated data of your dimensions (26746 * 18).



      Data



      set.seed(45)
      DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))



      data.table answer:



      require(data.table)
      DT <- data.table(value=unlist(DF, use.names=FALSE),
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]


      Benchmarking:



      # data.table solution
      system.time({
      DT <- data.table(value=unlist(DF, use.names=FALSE),
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
      })
      # user system elapsed
      # 0.174 0.029 0.227

      # apply solution from @thelatemail
      system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])
      # user system elapsed
      # 2.322 0.036 2.602

      identical(t1, t2)
      # [1] TRUE


      It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.





      Edit: if any of the max ids is okay, then:



      DT <- data.table(value=unlist(DF, use.names=FALSE), 
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid)), rowid, mult="last"]





      share|improve this answer


























      • I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

        – dmvianna
        Jul 19 '13 at 0:48
















      12














      If you're interested in a data.table solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!



      Here I've generated data of your dimensions (26746 * 18).



      Data



      set.seed(45)
      DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))



      data.table answer:



      require(data.table)
      DT <- data.table(value=unlist(DF, use.names=FALSE),
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]


      Benchmarking:



      # data.table solution
      system.time({
      DT <- data.table(value=unlist(DF, use.names=FALSE),
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
      })
      # user system elapsed
      # 0.174 0.029 0.227

      # apply solution from @thelatemail
      system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])
      # user system elapsed
      # 2.322 0.036 2.602

      identical(t1, t2)
      # [1] TRUE


      It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.





      Edit: if any of the max ids is okay, then:



      DT <- data.table(value=unlist(DF, use.names=FALSE), 
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid)), rowid, mult="last"]





      share|improve this answer


























      • I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

        – dmvianna
        Jul 19 '13 at 0:48














      12












      12








      12







      If you're interested in a data.table solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!



      Here I've generated data of your dimensions (26746 * 18).



      Data



      set.seed(45)
      DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))



      data.table answer:



      require(data.table)
      DT <- data.table(value=unlist(DF, use.names=FALSE),
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]


      Benchmarking:



      # data.table solution
      system.time({
      DT <- data.table(value=unlist(DF, use.names=FALSE),
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
      })
      # user system elapsed
      # 0.174 0.029 0.227

      # apply solution from @thelatemail
      system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])
      # user system elapsed
      # 2.322 0.036 2.602

      identical(t1, t2)
      # [1] TRUE


      It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.





      Edit: if any of the max ids is okay, then:



      DT <- data.table(value=unlist(DF, use.names=FALSE), 
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid)), rowid, mult="last"]





      share|improve this answer















      If you're interested in a data.table solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!



      Here I've generated data of your dimensions (26746 * 18).



      Data



      set.seed(45)
      DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))



      data.table answer:



      require(data.table)
      DT <- data.table(value=unlist(DF, use.names=FALSE),
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]


      Benchmarking:



      # data.table solution
      system.time({
      DT <- data.table(value=unlist(DF, use.names=FALSE),
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
      })
      # user system elapsed
      # 0.174 0.029 0.227

      # apply solution from @thelatemail
      system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])
      # user system elapsed
      # 2.322 0.036 2.602

      identical(t1, t2)
      # [1] TRUE


      It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.





      Edit: if any of the max ids is okay, then:



      DT <- data.table(value=unlist(DF, use.names=FALSE), 
      colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
      setkey(DT, colid, value)
      t1 <- DT[J(unique(colid)), rowid, mult="last"]






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jul 19 '13 at 1:13

























      answered Jul 19 '13 at 0:30









      ArunArun

      93k11217316




      93k11217316













      • I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

        – dmvianna
        Jul 19 '13 at 0:48



















      • I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

        – dmvianna
        Jul 19 '13 at 0:48

















      I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

      – dmvianna
      Jul 19 '13 at 0:48





      I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

      – dmvianna
      Jul 19 '13 at 0:48











      5














      Based on the above suggestions, the following data.table solution worked very fast for me:



      library(data.table)

      set.seed(45)
      DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))

      system.time(
      DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]
      )
      #> user system elapsed
      #> 0.15 0.06 0.21
      DT
      #> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max
      #> 1: 7 4 1 2 3 7 6 6 6 1 V1
      #> 2: 4 6 9 10 6 2 7 7 1 3 V4
      #> 3: 3 4 9 8 9 9 8 8 6 7 V3
      #> 4: 4 8 8 9 7 5 9 2 7 1 V4
      #> 5: 4 3 9 10 2 7 9 6 6 9 V4
      #> ---
      #> 999996: 4 6 10 5 4 7 3 8 2 8 V3
      #> 999997: 8 7 6 6 3 10 2 3 10 1 V6
      #> 999998: 2 3 2 7 4 7 5 2 7 3 V4
      #> 999999: 8 10 3 2 3 4 5 1 1 4 V2
      #> 1000000: 10 4 2 6 6 2 8 4 7 4 V1


      And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:



      DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]




      In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:



      DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]





      share|improve this answer


























      • I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

        – user1412
        Feb 26 '17 at 18:38











      • Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

        – Valentin
        Feb 28 '17 at 10:09






      • 1





        A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

        – lwshang
        Mar 9 '18 at 14:08


















      5














      Based on the above suggestions, the following data.table solution worked very fast for me:



      library(data.table)

      set.seed(45)
      DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))

      system.time(
      DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]
      )
      #> user system elapsed
      #> 0.15 0.06 0.21
      DT
      #> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max
      #> 1: 7 4 1 2 3 7 6 6 6 1 V1
      #> 2: 4 6 9 10 6 2 7 7 1 3 V4
      #> 3: 3 4 9 8 9 9 8 8 6 7 V3
      #> 4: 4 8 8 9 7 5 9 2 7 1 V4
      #> 5: 4 3 9 10 2 7 9 6 6 9 V4
      #> ---
      #> 999996: 4 6 10 5 4 7 3 8 2 8 V3
      #> 999997: 8 7 6 6 3 10 2 3 10 1 V6
      #> 999998: 2 3 2 7 4 7 5 2 7 3 V4
      #> 999999: 8 10 3 2 3 4 5 1 1 4 V2
      #> 1000000: 10 4 2 6 6 2 8 4 7 4 V1


      And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:



      DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]




      In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:



      DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]





      share|improve this answer


























      • I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

        – user1412
        Feb 26 '17 at 18:38











      • Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

        – Valentin
        Feb 28 '17 at 10:09






      • 1





        A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

        – lwshang
        Mar 9 '18 at 14:08
















      5












      5








      5







      Based on the above suggestions, the following data.table solution worked very fast for me:



      library(data.table)

      set.seed(45)
      DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))

      system.time(
      DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]
      )
      #> user system elapsed
      #> 0.15 0.06 0.21
      DT
      #> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max
      #> 1: 7 4 1 2 3 7 6 6 6 1 V1
      #> 2: 4 6 9 10 6 2 7 7 1 3 V4
      #> 3: 3 4 9 8 9 9 8 8 6 7 V3
      #> 4: 4 8 8 9 7 5 9 2 7 1 V4
      #> 5: 4 3 9 10 2 7 9 6 6 9 V4
      #> ---
      #> 999996: 4 6 10 5 4 7 3 8 2 8 V3
      #> 999997: 8 7 6 6 3 10 2 3 10 1 V6
      #> 999998: 2 3 2 7 4 7 5 2 7 3 V4
      #> 999999: 8 10 3 2 3 4 5 1 1 4 V2
      #> 1000000: 10 4 2 6 6 2 8 4 7 4 V1


      And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:



      DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]




      In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:



      DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]





      share|improve this answer















      Based on the above suggestions, the following data.table solution worked very fast for me:



      library(data.table)

      set.seed(45)
      DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))

      system.time(
      DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]
      )
      #> user system elapsed
      #> 0.15 0.06 0.21
      DT
      #> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max
      #> 1: 7 4 1 2 3 7 6 6 6 1 V1
      #> 2: 4 6 9 10 6 2 7 7 1 3 V4
      #> 3: 3 4 9 8 9 9 8 8 6 7 V3
      #> 4: 4 8 8 9 7 5 9 2 7 1 V4
      #> 5: 4 3 9 10 2 7 9 6 6 9 V4
      #> ---
      #> 999996: 4 6 10 5 4 7 3 8 2 8 V3
      #> 999997: 8 7 6 6 3 10 2 3 10 1 V6
      #> 999998: 2 3 2 7 4 7 5 2 7 3 V4
      #> 999999: 8 10 3 2 3 4 5 1 1 4 V2
      #> 1000000: 10 4 2 6 6 2 8 4 7 4 V1


      And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:



      DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]




      In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:



      DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Dec 31 '18 at 11:58

























      answered Oct 7 '16 at 18:04









      ValentinValentin

      1,9651230




      1,9651230













      • I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

        – user1412
        Feb 26 '17 at 18:38











      • Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

        – Valentin
        Feb 28 '17 at 10:09






      • 1





        A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

        – lwshang
        Mar 9 '18 at 14:08





















      • I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

        – user1412
        Feb 26 '17 at 18:38











      • Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

        – Valentin
        Feb 28 '17 at 10:09






      • 1





        A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

        – lwshang
        Mar 9 '18 at 14:08



















      I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

      – user1412
      Feb 26 '17 at 18:38





      I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

      – user1412
      Feb 26 '17 at 18:38













      Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

      – Valentin
      Feb 28 '17 at 10:09





      Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

      – Valentin
      Feb 28 '17 at 10:09




      1




      1





      A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

      – lwshang
      Mar 9 '18 at 14:08







      A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

      – lwshang
      Mar 9 '18 at 14:08













      3














      One solution could be to reshape the date from wide to long putting all the departments in one column and counts in another, group by the employer id (in this case, the row number), and then filter to the department(s) with the max value. There are a couple of options for handling ties with this approach too.



      library(tidyverse)

      # sample data frame with a tie
      df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))

      # If you aren't worried about ties:
      df %>%
      rownames_to_column('id') %>% # creates an ID number
      gather(dept, cnt, V1:V3) %>%
      group_by(id) %>%
      slice(which.max(cnt))

      # A tibble: 3 x 3
      # Groups: id [3]
      id dept cnt
      <chr> <chr> <dbl>
      1 1 V3 9.
      2 2 V1 8.
      3 3 V2 5.


      # If you're worried about keeping ties:
      df %>%
      rownames_to_column('id') %>%
      gather(dept, cnt, V1:V3) %>%
      group_by(id) %>%
      filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works
      arrange(id)

      # A tibble: 4 x 3
      # Groups: id [3]
      id dept cnt
      <chr> <chr> <dbl>
      1 1 V3 9.
      2 2 V1 8.
      3 3 V2 5.
      4 3 V3 5.


      # If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'
      df %>%
      rownames_to_column('id') %>%
      gather(dept, cnt, V1:V3) %>%
      group_by(id) %>%
      mutate(dept_rank = rank(-cnt, ties.method = "first")) %>% # or 'last'
      filter(dept_rank == 1) %>%
      select(-dept_rank)

      # A tibble: 3 x 3
      # Groups: id [3]
      id dept cnt
      <chr> <chr> <dbl>
      1 2 V1 8.
      2 3 V2 5.
      3 1 V3 9.

      # if you wanted to keep the original wide data frame
      df %>%
      rownames_to_column('id') %>%
      left_join(
      df %>%
      rownames_to_column('id') %>%
      gather(max_dept, max_cnt, V1:V3) %>%
      group_by(id) %>%
      slice(which.max(max_cnt)),
      by = 'id'
      )

      # A tibble: 3 x 6
      id V1 V2 V3 max_dept max_cnt
      <chr> <dbl> <dbl> <dbl> <chr> <dbl>
      1 1 2. 7. 9. V3 9.
      2 2 8. 3. 6. V1 8.
      3 3 1. 5. 5. V2 5.





      share|improve this answer






























        3














        One solution could be to reshape the date from wide to long putting all the departments in one column and counts in another, group by the employer id (in this case, the row number), and then filter to the department(s) with the max value. There are a couple of options for handling ties with this approach too.



        library(tidyverse)

        # sample data frame with a tie
        df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))

        # If you aren't worried about ties:
        df %>%
        rownames_to_column('id') %>% # creates an ID number
        gather(dept, cnt, V1:V3) %>%
        group_by(id) %>%
        slice(which.max(cnt))

        # A tibble: 3 x 3
        # Groups: id [3]
        id dept cnt
        <chr> <chr> <dbl>
        1 1 V3 9.
        2 2 V1 8.
        3 3 V2 5.


        # If you're worried about keeping ties:
        df %>%
        rownames_to_column('id') %>%
        gather(dept, cnt, V1:V3) %>%
        group_by(id) %>%
        filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works
        arrange(id)

        # A tibble: 4 x 3
        # Groups: id [3]
        id dept cnt
        <chr> <chr> <dbl>
        1 1 V3 9.
        2 2 V1 8.
        3 3 V2 5.
        4 3 V3 5.


        # If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'
        df %>%
        rownames_to_column('id') %>%
        gather(dept, cnt, V1:V3) %>%
        group_by(id) %>%
        mutate(dept_rank = rank(-cnt, ties.method = "first")) %>% # or 'last'
        filter(dept_rank == 1) %>%
        select(-dept_rank)

        # A tibble: 3 x 3
        # Groups: id [3]
        id dept cnt
        <chr> <chr> <dbl>
        1 2 V1 8.
        2 3 V2 5.
        3 1 V3 9.

        # if you wanted to keep the original wide data frame
        df %>%
        rownames_to_column('id') %>%
        left_join(
        df %>%
        rownames_to_column('id') %>%
        gather(max_dept, max_cnt, V1:V3) %>%
        group_by(id) %>%
        slice(which.max(max_cnt)),
        by = 'id'
        )

        # A tibble: 3 x 6
        id V1 V2 V3 max_dept max_cnt
        <chr> <dbl> <dbl> <dbl> <chr> <dbl>
        1 1 2. 7. 9. V3 9.
        2 2 8. 3. 6. V1 8.
        3 3 1. 5. 5. V2 5.





        share|improve this answer




























          3












          3








          3







          One solution could be to reshape the date from wide to long putting all the departments in one column and counts in another, group by the employer id (in this case, the row number), and then filter to the department(s) with the max value. There are a couple of options for handling ties with this approach too.



          library(tidyverse)

          # sample data frame with a tie
          df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))

          # If you aren't worried about ties:
          df %>%
          rownames_to_column('id') %>% # creates an ID number
          gather(dept, cnt, V1:V3) %>%
          group_by(id) %>%
          slice(which.max(cnt))

          # A tibble: 3 x 3
          # Groups: id [3]
          id dept cnt
          <chr> <chr> <dbl>
          1 1 V3 9.
          2 2 V1 8.
          3 3 V2 5.


          # If you're worried about keeping ties:
          df %>%
          rownames_to_column('id') %>%
          gather(dept, cnt, V1:V3) %>%
          group_by(id) %>%
          filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works
          arrange(id)

          # A tibble: 4 x 3
          # Groups: id [3]
          id dept cnt
          <chr> <chr> <dbl>
          1 1 V3 9.
          2 2 V1 8.
          3 3 V2 5.
          4 3 V3 5.


          # If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'
          df %>%
          rownames_to_column('id') %>%
          gather(dept, cnt, V1:V3) %>%
          group_by(id) %>%
          mutate(dept_rank = rank(-cnt, ties.method = "first")) %>% # or 'last'
          filter(dept_rank == 1) %>%
          select(-dept_rank)

          # A tibble: 3 x 3
          # Groups: id [3]
          id dept cnt
          <chr> <chr> <dbl>
          1 2 V1 8.
          2 3 V2 5.
          3 1 V3 9.

          # if you wanted to keep the original wide data frame
          df %>%
          rownames_to_column('id') %>%
          left_join(
          df %>%
          rownames_to_column('id') %>%
          gather(max_dept, max_cnt, V1:V3) %>%
          group_by(id) %>%
          slice(which.max(max_cnt)),
          by = 'id'
          )

          # A tibble: 3 x 6
          id V1 V2 V3 max_dept max_cnt
          <chr> <dbl> <dbl> <dbl> <chr> <dbl>
          1 1 2. 7. 9. V3 9.
          2 2 8. 3. 6. V1 8.
          3 3 1. 5. 5. V2 5.





          share|improve this answer















          One solution could be to reshape the date from wide to long putting all the departments in one column and counts in another, group by the employer id (in this case, the row number), and then filter to the department(s) with the max value. There are a couple of options for handling ties with this approach too.



          library(tidyverse)

          # sample data frame with a tie
          df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))

          # If you aren't worried about ties:
          df %>%
          rownames_to_column('id') %>% # creates an ID number
          gather(dept, cnt, V1:V3) %>%
          group_by(id) %>%
          slice(which.max(cnt))

          # A tibble: 3 x 3
          # Groups: id [3]
          id dept cnt
          <chr> <chr> <dbl>
          1 1 V3 9.
          2 2 V1 8.
          3 3 V2 5.


          # If you're worried about keeping ties:
          df %>%
          rownames_to_column('id') %>%
          gather(dept, cnt, V1:V3) %>%
          group_by(id) %>%
          filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works
          arrange(id)

          # A tibble: 4 x 3
          # Groups: id [3]
          id dept cnt
          <chr> <chr> <dbl>
          1 1 V3 9.
          2 2 V1 8.
          3 3 V2 5.
          4 3 V3 5.


          # If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'
          df %>%
          rownames_to_column('id') %>%
          gather(dept, cnt, V1:V3) %>%
          group_by(id) %>%
          mutate(dept_rank = rank(-cnt, ties.method = "first")) %>% # or 'last'
          filter(dept_rank == 1) %>%
          select(-dept_rank)

          # A tibble: 3 x 3
          # Groups: id [3]
          id dept cnt
          <chr> <chr> <dbl>
          1 2 V1 8.
          2 3 V2 5.
          3 1 V3 9.

          # if you wanted to keep the original wide data frame
          df %>%
          rownames_to_column('id') %>%
          left_join(
          df %>%
          rownames_to_column('id') %>%
          gather(max_dept, max_cnt, V1:V3) %>%
          group_by(id) %>%
          slice(which.max(max_cnt)),
          by = 'id'
          )

          # A tibble: 3 x 6
          id V1 V2 V3 max_dept max_cnt
          <chr> <dbl> <dbl> <dbl> <chr> <dbl>
          1 1 2. 7. 9. V3 9.
          2 2 8. 3. 6. V1 8.
          3 3 1. 5. 5. V2 5.






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 31 '18 at 12:21

























          answered Mar 31 '18 at 11:23









          sbhasbha

          2,27222124




          2,27222124























              1














              A simple for loop can also be handy:



              > df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
              > df
              V1 V2 V3
              1 2 7 9
              2 8 3 6
              3 1 5 4
              > df2<-data.frame()
              > for (i in 1:nrow(df)){
              + df2[i,1]<-colnames(df[which.max(df[i,])])
              + }
              > df2
              V1
              1 V3
              2 V1
              3 V2





              share|improve this answer






























                1














                A simple for loop can also be handy:



                > df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
                > df
                V1 V2 V3
                1 2 7 9
                2 8 3 6
                3 1 5 4
                > df2<-data.frame()
                > for (i in 1:nrow(df)){
                + df2[i,1]<-colnames(df[which.max(df[i,])])
                + }
                > df2
                V1
                1 V3
                2 V1
                3 V2





                share|improve this answer




























                  1












                  1








                  1







                  A simple for loop can also be handy:



                  > df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
                  > df
                  V1 V2 V3
                  1 2 7 9
                  2 8 3 6
                  3 1 5 4
                  > df2<-data.frame()
                  > for (i in 1:nrow(df)){
                  + df2[i,1]<-colnames(df[which.max(df[i,])])
                  + }
                  > df2
                  V1
                  1 V3
                  2 V1
                  3 V2





                  share|improve this answer















                  A simple for loop can also be handy:



                  > df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
                  > df
                  V1 V2 V3
                  1 2 7 9
                  2 8 3 6
                  3 1 5 4
                  > df2<-data.frame()
                  > for (i in 1:nrow(df)){
                  + df2[i,1]<-colnames(df[which.max(df[i,])])
                  + }
                  > df2
                  V1
                  1 V3
                  2 V1
                  3 V2






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jul 3 '18 at 5:11

























                  answered Jul 3 '18 at 5:05









                  rarrar

                  6331515




                  6331515























                      1














                      A dplyr solution:



                      Idea:




                      • add rowids as a column

                      • reshape to long format

                      • filter for max in each group


                      Code:



                      DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
                      DF %>%
                      rownames_to_column() %>%
                      gather(column, value, -rowname) %>%
                      group_by(rowname) %>%
                      filter(rank(-value) == 1)


                      Result:



                      # A tibble: 3 x 3
                      # Groups: rowname [3]
                      rowname column value
                      <chr> <chr> <dbl>
                      1 2 V1 8
                      2 3 V2 5
                      3 1 V3 9


                      This approach can be easily extended to get the top n columns.
                      Example for n=2:



                      DF %>% 
                      rownames_to_column() %>%
                      gather(column, value, -rowname) %>%
                      group_by(rowname) %>%
                      mutate(rk = rank(-value)) %>%
                      filter(rk <= 2) %>%
                      arrange(rowname, rk)


                      Result:



                      # A tibble: 6 x 4
                      # Groups: rowname [3]
                      rowname column value rk
                      <chr> <chr> <dbl> <dbl>
                      1 1 V3 9 1
                      2 1 V2 7 2
                      3 2 V1 8 1
                      4 2 V3 6 2
                      5 3 V2 5 1
                      6 3 V3 4 2





                      share|improve this answer
























                      • Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

                        – Gregor
                        Nov 6 '18 at 14:46
















                      1














                      A dplyr solution:



                      Idea:




                      • add rowids as a column

                      • reshape to long format

                      • filter for max in each group


                      Code:



                      DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
                      DF %>%
                      rownames_to_column() %>%
                      gather(column, value, -rowname) %>%
                      group_by(rowname) %>%
                      filter(rank(-value) == 1)


                      Result:



                      # A tibble: 3 x 3
                      # Groups: rowname [3]
                      rowname column value
                      <chr> <chr> <dbl>
                      1 2 V1 8
                      2 3 V2 5
                      3 1 V3 9


                      This approach can be easily extended to get the top n columns.
                      Example for n=2:



                      DF %>% 
                      rownames_to_column() %>%
                      gather(column, value, -rowname) %>%
                      group_by(rowname) %>%
                      mutate(rk = rank(-value)) %>%
                      filter(rk <= 2) %>%
                      arrange(rowname, rk)


                      Result:



                      # A tibble: 6 x 4
                      # Groups: rowname [3]
                      rowname column value rk
                      <chr> <chr> <dbl> <dbl>
                      1 1 V3 9 1
                      2 1 V2 7 2
                      3 2 V1 8 1
                      4 2 V3 6 2
                      5 3 V2 5 1
                      6 3 V3 4 2





                      share|improve this answer
























                      • Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

                        – Gregor
                        Nov 6 '18 at 14:46














                      1












                      1








                      1







                      A dplyr solution:



                      Idea:




                      • add rowids as a column

                      • reshape to long format

                      • filter for max in each group


                      Code:



                      DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
                      DF %>%
                      rownames_to_column() %>%
                      gather(column, value, -rowname) %>%
                      group_by(rowname) %>%
                      filter(rank(-value) == 1)


                      Result:



                      # A tibble: 3 x 3
                      # Groups: rowname [3]
                      rowname column value
                      <chr> <chr> <dbl>
                      1 2 V1 8
                      2 3 V2 5
                      3 1 V3 9


                      This approach can be easily extended to get the top n columns.
                      Example for n=2:



                      DF %>% 
                      rownames_to_column() %>%
                      gather(column, value, -rowname) %>%
                      group_by(rowname) %>%
                      mutate(rk = rank(-value)) %>%
                      filter(rk <= 2) %>%
                      arrange(rowname, rk)


                      Result:



                      # A tibble: 6 x 4
                      # Groups: rowname [3]
                      rowname column value rk
                      <chr> <chr> <dbl> <dbl>
                      1 1 V3 9 1
                      2 1 V2 7 2
                      3 2 V1 8 1
                      4 2 V3 6 2
                      5 3 V2 5 1
                      6 3 V3 4 2





                      share|improve this answer













                      A dplyr solution:



                      Idea:




                      • add rowids as a column

                      • reshape to long format

                      • filter for max in each group


                      Code:



                      DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
                      DF %>%
                      rownames_to_column() %>%
                      gather(column, value, -rowname) %>%
                      group_by(rowname) %>%
                      filter(rank(-value) == 1)


                      Result:



                      # A tibble: 3 x 3
                      # Groups: rowname [3]
                      rowname column value
                      <chr> <chr> <dbl>
                      1 2 V1 8
                      2 3 V2 5
                      3 1 V3 9


                      This approach can be easily extended to get the top n columns.
                      Example for n=2:



                      DF %>% 
                      rownames_to_column() %>%
                      gather(column, value, -rowname) %>%
                      group_by(rowname) %>%
                      mutate(rk = rank(-value)) %>%
                      filter(rk <= 2) %>%
                      arrange(rowname, rk)


                      Result:



                      # A tibble: 6 x 4
                      # Groups: rowname [3]
                      rowname column value rk
                      <chr> <chr> <dbl> <dbl>
                      1 1 V3 9 1
                      2 1 V2 7 2
                      3 2 V1 8 1
                      4 2 V3 6 2
                      5 3 V2 5 1
                      6 3 V3 4 2






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Nov 6 '18 at 14:31









                      Gregor SturmGregor Sturm

                      753618




                      753618













                      • Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

                        – Gregor
                        Nov 6 '18 at 14:46



















                      • Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

                        – Gregor
                        Nov 6 '18 at 14:46

















                      Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

                      – Gregor
                      Nov 6 '18 at 14:46





                      Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

                      – Gregor
                      Nov 6 '18 at 14:46





                      protected by zx8754 Jul 14 '17 at 13:16



                      Thank you for your interest in this question.
                      Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                      Would you like to answer one of these unanswered questions instead?



                      Popular posts from this blog

                      Mossoró

                      Error while reading .h5 file using the rhdf5 package in R

                      Pushsharp Apns notification error: 'InvalidToken'