For each row return the column name of the largest value

I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier to return the department name, rather than the number of roster counts, from the frequency table. A simple example below (column names = departments, row names = employee ids).

DF <- matrix(sample(1:9,9),ncol=3,nrow=3)

DF <- as.data.frame.matrix(DF)

> DF

  V1 V2 V3

1  2  7  9

2  8  3  6

3  1  5  4

Now how do I get

> DF2

  RE

1 V3

2 V1

3 V2

edited Jul 14 '17 at 13:17

zx8754

29.8k76399

asked Jul 18 '13 at 23:45

dmvianna

6,704125789

how big is your actual data?

– Arun
Jul 18 '13 at 23:51

1

@Arun > dim(test) [1] 26746 18

– dmvianna
Jul 18 '13 at 23:57

4

An interesting generalization would be the largest n values' column names per row

– Hack-R
Sep 7 '16 at 0:07

add a comment |

DF <- matrix(sample(1:9,9),ncol=3,nrow=3)

DF <- as.data.frame.matrix(DF)

> DF

  V1 V2 V3

1  2  7  9

2  8  3  6

3  1  5  4

Now how do I get

> DF2

  RE

1 V3

2 V1

3 V2

edited Jul 14 '17 at 13:17

zx8754

29.8k76399

asked Jul 18 '13 at 23:45

dmvianna

6,704125789

how big is your actual data?

– Arun
Jul 18 '13 at 23:51

1

@Arun > dim(test) [1] 26746 18

– dmvianna
Jul 18 '13 at 23:57

4

An interesting generalization would be the largest n values' column names per row

– Hack-R
Sep 7 '16 at 0:07

add a comment |

DF <- matrix(sample(1:9,9),ncol=3,nrow=3)

DF <- as.data.frame.matrix(DF)

> DF

  V1 V2 V3

1  2  7  9

2  8  3  6

3  1  5  4

Now how do I get

> DF2

  RE

1 V3

2 V1

3 V2

edited Jul 14 '17 at 13:17

zx8754

29.8k76399

asked Jul 18 '13 at 23:45

dmvianna

6,704125789

DF <- matrix(sample(1:9,9),ncol=3,nrow=3)

DF <- as.data.frame.matrix(DF)

> DF

  V1 V2 V3

1  2  7  9

2  8  3  6

3  1  5  4

Now how do I get

> DF2

  RE

1 V3

2 V1

3 V2

edited Jul 14 '17 at 13:17

zx8754

29.8k76399

asked Jul 18 '13 at 23:45

dmvianna

6,704125789

edited Jul 14 '17 at 13:17

zx8754

29.8k76399

asked Jul 18 '13 at 23:45

dmvianna

6,704125789

edited Jul 14 '17 at 13:17

zx8754

29.8k76399

edited Jul 14 '17 at 13:17

zx8754

29.8k76399

edited Jul 14 '17 at 13:17

zx8754

29.8k76399

asked Jul 18 '13 at 23:45

dmvianna

6,704125789

asked Jul 18 '13 at 23:45

dmvianna

6,704125789

asked Jul 18 '13 at 23:45

dmvianna

6,704125789

how big is your actual data?

– Arun
Jul 18 '13 at 23:51

1

@Arun > dim(test) [1] 26746 18

– dmvianna
Jul 18 '13 at 23:57

4

An interesting generalization would be the largest n values' column names per row

– Hack-R
Sep 7 '16 at 0:07

add a comment |

how big is your actual data?

– Arun
Jul 18 '13 at 23:51

1

@Arun > dim(test) [1] 26746 18

– dmvianna
Jul 18 '13 at 23:57

4

An interesting generalization would be the largest n values' column names per row

– Hack-R
Sep 7 '16 at 0:07

how big is your actual data?

– Arun
Jul 18 '13 at 23:51

@Arun > dim(test) [1] 26746 18

– dmvianna
Jul 18 '13 at 23:57

An interesting generalization would be the largest n values' column names per row

– Hack-R
Sep 7 '16 at 0:07

add a comment |

6 Answers
6

active

oldest

votes

One option using your data (for future reference, use set.seed() to make examples using sample reproducible):

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))



colnames(DF)[apply(DF,1,which.max)]

[1] "V3" "V1" "V2"

A faster solution than using apply might be max.col:

colnames(DF)[max.col(DF,ties.method="first")]

#[1] "V3" "V1" "V2"

...where ties.method can be any of "random" "first" or "last"

This of course causes issues if you happen to have two columns which are equal to the maximum. I'm not sure what you want to do in that instance as you will have more than one result for some rows. E.g.:

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))

apply(DF,1,function(x) which(x==max(x)))



[[1]]

V2 V3 

 2  3 



[[2]]

V1 

 1 



[[3]]

V2 

 2

edited Sep 30 '14 at 22:49

answered Jul 18 '13 at 23:49

thelatemail

67.4k882150

If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

– dmvianna
Jul 18 '13 at 23:59

@dmvianna - using which.max will be fine then.

– thelatemail
Jul 19 '13 at 0:03

I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

– dmvianna
Jul 19 '13 at 0:05

apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

– Arun
Jul 19 '13 at 0:07

1

@PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

– thelatemail
Apr 19 '16 at 22:14

|
show 2 more comments

If you're interested in a data.table solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!

Here I've generated data of your dimensions (26746 * 18).

Data

set.seed(45)

DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))

`data.table` answer:

require(data.table)

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

Benchmarking:

# data.table solution

system.time({

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

})

#   user  system elapsed 

#  0.174   0.029   0.227 



# apply solution from @thelatemail

system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])

#   user  system elapsed 

#  2.322   0.036   2.602 



identical(t1, t2)

# [1] TRUE

It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.

Edit: if any of the max ids is okay, then:

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid)), rowid, mult="last"]

edited Jul 19 '13 at 1:13

answered Jul 19 '13 at 0:30

Arun

93k11217316

I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

– dmvianna
Jul 19 '13 at 0:48

add a comment |

Based on the above suggestions, the following data.table solution worked very fast for me:

library(data.table)



set.seed(45)

DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))



system.time(

  DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]

)

#>    user  system elapsed 

#>    0.15    0.06    0.21

DT

#>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max

#>       1:  7  4  1  2  3  7  6  6  6   1      V1

#>       2:  4  6  9 10  6  2  7  7  1   3      V4

#>       3:  3  4  9  8  9  9  8  8  6   7      V3

#>       4:  4  8  8  9  7  5  9  2  7   1      V4

#>       5:  4  3  9 10  2  7  9  6  6   9      V4

#>      ---                                       

#>  999996:  4  6 10  5  4  7  3  8  2   8      V3

#>  999997:  8  7  6  6  3 10  2  3 10   1      V6

#>  999998:  2  3  2  7  4  7  5  2  7   3      V4

#>  999999:  8 10  3  2  3  4  5  1  1   4      V2

#> 1000000: 10  4  2  6  6  2  8  4  7   4      V1

And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:

DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]

In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:

DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]

edited Dec 31 '18 at 11:58

answered Oct 7 '16 at 18:04

Valentin

1,9651230

I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

– user1412
Feb 26 '17 at 18:38

Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

– Valentin
Feb 28 '17 at 10:09

1

A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

– lwshang
Mar 9 '18 at 14:08

add a comment |

One solution could be to reshape the date from wide to long putting all the departments in one column and counts in another, group by the employer id (in this case, the row number), and then filter to the department(s) with the max value. There are a couple of options for handling ties with this approach too.

library(tidyverse)



# sample data frame with a tie

df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))



# If you aren't worried about ties:  

df %>% 

  rownames_to_column('id') %>%  # creates an ID number

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  slice(which.max(cnt)) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.





# If you're worried about keeping ties:

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works

  arrange(id)



# A tibble: 4 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.

4 3     V3       5.





# If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  mutate(dept_rank  = rank(-cnt, ties.method = "first")) %>% # or 'last'

  filter(dept_rank == 1) %>% 

  select(-dept_rank) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 2     V1       8.

2 3     V2       5.

3 1     V3       9.



# if you wanted to keep the original wide data frame

df %>% 

  rownames_to_column('id') %>%

  left_join(

    df %>% 

      rownames_to_column('id') %>%

      gather(max_dept, max_cnt, V1:V3) %>% 

      group_by(id) %>% 

      slice(which.max(max_cnt)), 

    by = 'id'

  )



# A tibble: 3 x 6

  id       V1    V2    V3 max_dept max_cnt

  <chr> <dbl> <dbl> <dbl> <chr>      <dbl>

1 1        2.    7.    9. V3            9.

2 2        8.    3.    6. V1            8.

3 3        1.    5.    5. V2            5.

edited Mar 31 '18 at 12:21

answered Mar 31 '18 at 11:23

sbha

2,27222124

add a comment |

A simple for loop can also be handy:

> df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

> df

  V1 V2 V3

1  2  7  9

2  8  3  6

3  1  5  4

> df2<-data.frame()

> for (i in 1:nrow(df)){

+   df2[i,1]<-colnames(df[which.max(df[i,])])

+ }

> df2

  V1

1 V3

2 V1

3 V2

edited Jul 3 '18 at 5:11

answered Jul 3 '18 at 5:05

rar

6331515

add a comment |

A `dplyr` solution:

Idea:

add rowids as a column

reshape to long format

filter for max in each group

Code:

DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  filter(rank(-value) == 1)

Result:

# A tibble: 3 x 3

# Groups:   rowname [3]

  rowname column value

  <chr>   <chr>  <dbl>

1 2       V1         8

2 3       V2         5

3 1       V3         9

This approach can be easily extended to get the top n columns.
Example for n=2:

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  mutate(rk = rank(-value)) %>%

  filter(rk <= 2) %>% 

  arrange(rowname, rk)

Result:

# A tibble: 6 x 4

# Groups:   rowname [3]

  rowname column value    rk

  <chr>   <chr>  <dbl> <dbl>

1 1       V3         9     1

2 1       V2         7     2

3 2       V1         8     1

4 2       V3         6     2

5 3       V2         5     1

6 3       V3         4     2

answered Nov 6 '18 at 14:31

Gregor Sturm

753618

Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

– Gregor
Nov 6 '18 at 14:46

add a comment |

protected by zx8754 Jul 14 '17 at 13:16

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

One option using your data (for future reference, use set.seed() to make examples using sample reproducible):

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))



colnames(DF)[apply(DF,1,which.max)]

[1] "V3" "V1" "V2"

A faster solution than using apply might be max.col:

colnames(DF)[max.col(DF,ties.method="first")]

#[1] "V3" "V1" "V2"

...where ties.method can be any of "random" "first" or "last"

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))

apply(DF,1,function(x) which(x==max(x)))



[[1]]

V2 V3 

 2  3 



[[2]]

V1 

 1 



[[3]]

V2 

 2

edited Sep 30 '14 at 22:49

answered Jul 18 '13 at 23:49

thelatemail

67.4k882150

If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

– dmvianna
Jul 18 '13 at 23:59

@dmvianna - using which.max will be fine then.

– thelatemail
Jul 19 '13 at 0:03

I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

– dmvianna
Jul 19 '13 at 0:05

apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

– Arun
Jul 19 '13 at 0:07

1

@PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

– thelatemail
Apr 19 '16 at 22:14

|
show 2 more comments

One option using your data (for future reference, use set.seed() to make examples using sample reproducible):

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))



colnames(DF)[apply(DF,1,which.max)]

[1] "V3" "V1" "V2"

A faster solution than using apply might be max.col:

colnames(DF)[max.col(DF,ties.method="first")]

#[1] "V3" "V1" "V2"

...where ties.method can be any of "random" "first" or "last"

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))

apply(DF,1,function(x) which(x==max(x)))



[[1]]

V2 V3 

 2  3 



[[2]]

V1 

 1 



[[3]]

V2 

 2

edited Sep 30 '14 at 22:49

answered Jul 18 '13 at 23:49

thelatemail

67.4k882150

If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

– dmvianna
Jul 18 '13 at 23:59

@dmvianna - using which.max will be fine then.

– thelatemail
Jul 19 '13 at 0:03

I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

– dmvianna
Jul 19 '13 at 0:05

apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

– Arun
Jul 19 '13 at 0:07

1

@PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

– thelatemail
Apr 19 '16 at 22:14

|
show 2 more comments

One option using your data (for future reference, use set.seed() to make examples using sample reproducible):

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))



colnames(DF)[apply(DF,1,which.max)]

[1] "V3" "V1" "V2"

A faster solution than using apply might be max.col:

colnames(DF)[max.col(DF,ties.method="first")]

#[1] "V3" "V1" "V2"

...where ties.method can be any of "random" "first" or "last"

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))

apply(DF,1,function(x) which(x==max(x)))



[[1]]

V2 V3 

 2  3 



[[2]]

V1 

 1 



[[3]]

V2 

 2

edited Sep 30 '14 at 22:49

answered Jul 18 '13 at 23:49

thelatemail

67.4k882150

One option using your data (for future reference, use set.seed() to make examples using sample reproducible):

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))



colnames(DF)[apply(DF,1,which.max)]

[1] "V3" "V1" "V2"

A faster solution than using apply might be max.col:

colnames(DF)[max.col(DF,ties.method="first")]

#[1] "V3" "V1" "V2"

...where ties.method can be any of "random" "first" or "last"

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))

apply(DF,1,function(x) which(x==max(x)))



[[1]]

V2 V3 

 2  3 



[[2]]

V1 

 1 



[[3]]

V2 

 2

edited Sep 30 '14 at 22:49

answered Jul 18 '13 at 23:49

thelatemail

67.4k882150

edited Sep 30 '14 at 22:49

answered Jul 18 '13 at 23:49

thelatemail

67.4k882150

answered Jul 18 '13 at 23:49

thelatemail

67.4k882150

answered Jul 18 '13 at 23:49

thelatemail

67.4k882150

If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

– dmvianna
Jul 18 '13 at 23:59

@dmvianna - using which.max will be fine then.

– thelatemail
Jul 19 '13 at 0:03

I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

– dmvianna
Jul 19 '13 at 0:05

apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

– Arun
Jul 19 '13 at 0:07

1

@PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

– thelatemail
Apr 19 '16 at 22:14

|
show 2 more comments

If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

– dmvianna
Jul 18 '13 at 23:59

@dmvianna - using which.max will be fine then.

– thelatemail
Jul 19 '13 at 0:03

I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

– dmvianna
Jul 19 '13 at 0:05

apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

– Arun
Jul 19 '13 at 0:07

1

@PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

– thelatemail
Apr 19 '16 at 22:14

If I have two equal columns I usually just pick the first. These are border cases which do not upset my statistical analysis.

– dmvianna
Jul 18 '13 at 23:59

@dmvianna - using which.max will be fine then.

– thelatemail
Jul 19 '13 at 0:03

I'm assuming the order is preserved, so I can create a new column with this vector that will align correctly to the employees IDs. Is that correct?

– dmvianna
Jul 19 '13 at 0:05

apply converts the data.frame to matrix internally. You may not see a performance difference on these dimensions though.

– Arun
Jul 19 '13 at 0:07

@PankajKaundal - assuming distinct values, how about this colnames(DF)[max.col(replace(DF, cbind(seq_len(nrow(DF)), max.col(DF,ties.method="first")), -Inf), "first")]

– thelatemail
Apr 19 '16 at 22:14

|
show 2 more comments

Here I've generated data of your dimensions (26746 * 18).

Data

set.seed(45)

DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))

`data.table` answer:

require(data.table)

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

Benchmarking:

# data.table solution

system.time({

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

})

#   user  system elapsed 

#  0.174   0.029   0.227 



# apply solution from @thelatemail

system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])

#   user  system elapsed 

#  2.322   0.036   2.602 



identical(t1, t2)

# [1] TRUE

It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.

Edit: if any of the max ids is okay, then:

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid)), rowid, mult="last"]

edited Jul 19 '13 at 1:13

answered Jul 19 '13 at 0:30

Arun

93k11217316

I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

– dmvianna
Jul 19 '13 at 0:48

add a comment |

Here I've generated data of your dimensions (26746 * 18).

Data

set.seed(45)

DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))

`data.table` answer:

require(data.table)

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

Benchmarking:

# data.table solution

system.time({

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

})

#   user  system elapsed 

#  0.174   0.029   0.227 



# apply solution from @thelatemail

system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])

#   user  system elapsed 

#  2.322   0.036   2.602 



identical(t1, t2)

# [1] TRUE

It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.

Edit: if any of the max ids is okay, then:

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid)), rowid, mult="last"]

edited Jul 19 '13 at 1:13

answered Jul 19 '13 at 0:30

Arun

93k11217316

I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

– dmvianna
Jul 19 '13 at 0:48

add a comment |

Here I've generated data of your dimensions (26746 * 18).

Data

set.seed(45)

DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))

`data.table` answer:

require(data.table)

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

Benchmarking:

# data.table solution

system.time({

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

})

#   user  system elapsed 

#  0.174   0.029   0.227 



# apply solution from @thelatemail

system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])

#   user  system elapsed 

#  2.322   0.036   2.602 



identical(t1, t2)

# [1] TRUE

It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.

Edit: if any of the max ids is okay, then:

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid)), rowid, mult="last"]

edited Jul 19 '13 at 1:13

answered Jul 19 '13 at 0:30

Arun

93k11217316

Here I've generated data of your dimensions (26746 * 18).

Data

set.seed(45)

DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))

`data.table` answer:

require(data.table)

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

Benchmarking:

# data.table solution

system.time({

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

})

#   user  system elapsed 

#  0.174   0.029   0.227 



# apply solution from @thelatemail

system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])

#   user  system elapsed 

#  2.322   0.036   2.602 



identical(t1, t2)

# [1] TRUE

It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.

Edit: if any of the max ids is okay, then:

DT <- data.table(value=unlist(DF, use.names=FALSE), 

            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))

setkey(DT, colid, value)

t1 <- DT[J(unique(colid)), rowid, mult="last"]

edited Jul 19 '13 at 1:13

answered Jul 19 '13 at 0:30

Arun

93k11217316

edited Jul 19 '13 at 1:13

answered Jul 19 '13 at 0:30

Arun

93k11217316

answered Jul 19 '13 at 0:30

Arun

93k11217316

answered Jul 19 '13 at 0:30

Arun

93k11217316

I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

– dmvianna
Jul 19 '13 at 0:48

add a comment |

I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

– dmvianna
Jul 19 '13 at 0:48

I actually dont' care if it is the first or last maximum. I'm going for simplicity first, but I'm sure a data.table solution will come handy in the future, thanks!

– dmvianna
Jul 19 '13 at 0:48

add a comment |

Based on the above suggestions, the following data.table solution worked very fast for me:

library(data.table)



set.seed(45)

DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))



system.time(

  DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]

)

#>    user  system elapsed 

#>    0.15    0.06    0.21

DT

#>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max

#>       1:  7  4  1  2  3  7  6  6  6   1      V1

#>       2:  4  6  9 10  6  2  7  7  1   3      V4

#>       3:  3  4  9  8  9  9  8  8  6   7      V3

#>       4:  4  8  8  9  7  5  9  2  7   1      V4

#>       5:  4  3  9 10  2  7  9  6  6   9      V4

#>      ---                                       

#>  999996:  4  6 10  5  4  7  3  8  2   8      V3

#>  999997:  8  7  6  6  3 10  2  3 10   1      V6

#>  999998:  2  3  2  7  4  7  5  2  7   3      V4

#>  999999:  8 10  3  2  3  4  5  1  1   4      V2

#> 1000000: 10  4  2  6  6  2  8  4  7   4      V1

And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:

DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]

In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:

DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]

edited Dec 31 '18 at 11:58

answered Oct 7 '16 at 18:04

Valentin

1,9651230

I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

– user1412
Feb 26 '17 at 18:38

Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

– Valentin
Feb 28 '17 at 10:09

1

A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

– lwshang
Mar 9 '18 at 14:08

add a comment |

Based on the above suggestions, the following data.table solution worked very fast for me:

library(data.table)



set.seed(45)

DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))



system.time(

  DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]

)

#>    user  system elapsed 

#>    0.15    0.06    0.21

DT

#>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max

#>       1:  7  4  1  2  3  7  6  6  6   1      V1

#>       2:  4  6  9 10  6  2  7  7  1   3      V4

#>       3:  3  4  9  8  9  9  8  8  6   7      V3

#>       4:  4  8  8  9  7  5  9  2  7   1      V4

#>       5:  4  3  9 10  2  7  9  6  6   9      V4

#>      ---                                       

#>  999996:  4  6 10  5  4  7  3  8  2   8      V3

#>  999997:  8  7  6  6  3 10  2  3 10   1      V6

#>  999998:  2  3  2  7  4  7  5  2  7   3      V4

#>  999999:  8 10  3  2  3  4  5  1  1   4      V2

#> 1000000: 10  4  2  6  6  2  8  4  7   4      V1

And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:

DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]

In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:

DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]

edited Dec 31 '18 at 11:58

answered Oct 7 '16 at 18:04

Valentin

1,9651230

I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

– user1412
Feb 26 '17 at 18:38

Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

– Valentin
Feb 28 '17 at 10:09

1

A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

– lwshang
Mar 9 '18 at 14:08

add a comment |

Based on the above suggestions, the following data.table solution worked very fast for me:

library(data.table)



set.seed(45)

DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))



system.time(

  DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]

)

#>    user  system elapsed 

#>    0.15    0.06    0.21

DT

#>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max

#>       1:  7  4  1  2  3  7  6  6  6   1      V1

#>       2:  4  6  9 10  6  2  7  7  1   3      V4

#>       3:  3  4  9  8  9  9  8  8  6   7      V3

#>       4:  4  8  8  9  7  5  9  2  7   1      V4

#>       5:  4  3  9 10  2  7  9  6  6   9      V4

#>      ---                                       

#>  999996:  4  6 10  5  4  7  3  8  2   8      V3

#>  999997:  8  7  6  6  3 10  2  3 10   1      V6

#>  999998:  2  3  2  7  4  7  5  2  7   3      V4

#>  999999:  8 10  3  2  3  4  5  1  1   4      V2

#> 1000000: 10  4  2  6  6  2  8  4  7   4      V1

And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:

DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]

In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:

DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]

edited Dec 31 '18 at 11:58

answered Oct 7 '16 at 18:04

Valentin

1,9651230

Based on the above suggestions, the following data.table solution worked very fast for me:

library(data.table)



set.seed(45)

DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))



system.time(

  DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]

)

#>    user  system elapsed 

#>    0.15    0.06    0.21

DT

#>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max

#>       1:  7  4  1  2  3  7  6  6  6   1      V1

#>       2:  4  6  9 10  6  2  7  7  1   3      V4

#>       3:  3  4  9  8  9  9  8  8  6   7      V3

#>       4:  4  8  8  9  7  5  9  2  7   1      V4

#>       5:  4  3  9 10  2  7  9  6  6   9      V4

#>      ---                                       

#>  999996:  4  6 10  5  4  7  3  8  2   8      V3

#>  999997:  8  7  6  6  3 10  2  3 10   1      V6

#>  999998:  2  3  2  7  4  7  5  2  7   3      V4

#>  999999:  8 10  3  2  3  4  5  1  1   4      V2

#> 1000000: 10  4  2  6  6  2  8  4  7   4      V1

And also comes with the advantage that can always specify what columns .SD should consider by mentioning them in .SDcols:

DT[, MAX2 := colnames(.SD)[max.col(.SD, ties.method="first")], .SDcols = c("V9", "V10")]

In case we need the column name of the smallest value, as suggested by @lwshang, one just needs to use -.SD:

DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]

edited Dec 31 '18 at 11:58

answered Oct 7 '16 at 18:04

Valentin

1,9651230

edited Dec 31 '18 at 11:58

answered Oct 7 '16 at 18:04

Valentin

1,9651230

answered Oct 7 '16 at 18:04

Valentin

1,9651230

answered Oct 7 '16 at 18:04

Valentin

1,9651230

I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

– user1412
Feb 26 '17 at 18:38

Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

– Valentin
Feb 28 '17 at 10:09

1

A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

– lwshang
Mar 9 '18 at 14:08

add a comment |

I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

– user1412
Feb 26 '17 at 18:38

Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

– Valentin
Feb 28 '17 at 10:09

1

A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

– lwshang
Mar 9 '18 at 14:08

I had a similar requirement but want to get the column name having the minimum value for each row.....we don't seem to have min.col in R.....would you know what would be the equivalent solution?

– user1412
Feb 26 '17 at 18:38

Hi @user1412. Thanks for your interesting question. I don't have any idea right now other than using the which.min in something that would look like: DT[, MIN := colnames(.SD)[apply(.SD,1,which.min)]] or DT[, MIN2 := colnames(.SD)[which.min(.SD)], by = 1:nrow(DT)] on the dummy data above. This doesn't consider ties and returns only the first minimum. Maybe consider asking a separate question. I would be curious as well what other answers you would get.

– Valentin
Feb 28 '17 at 10:09

A trick to get minimum column is sending the negative of the data.frame into max.col, like: colnames(.SD)[max.col(-.SD, ties.method="first")].

– lwshang
Mar 9 '18 at 14:08

add a comment |

library(tidyverse)



# sample data frame with a tie

df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))



# If you aren't worried about ties:  

df %>% 

  rownames_to_column('id') %>%  # creates an ID number

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  slice(which.max(cnt)) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.





# If you're worried about keeping ties:

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works

  arrange(id)



# A tibble: 4 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.

4 3     V3       5.





# If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  mutate(dept_rank  = rank(-cnt, ties.method = "first")) %>% # or 'last'

  filter(dept_rank == 1) %>% 

  select(-dept_rank) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 2     V1       8.

2 3     V2       5.

3 1     V3       9.



# if you wanted to keep the original wide data frame

df %>% 

  rownames_to_column('id') %>%

  left_join(

    df %>% 

      rownames_to_column('id') %>%

      gather(max_dept, max_cnt, V1:V3) %>% 

      group_by(id) %>% 

      slice(which.max(max_cnt)), 

    by = 'id'

  )



# A tibble: 3 x 6

  id       V1    V2    V3 max_dept max_cnt

  <chr> <dbl> <dbl> <dbl> <chr>      <dbl>

1 1        2.    7.    9. V3            9.

2 2        8.    3.    6. V1            8.

3 3        1.    5.    5. V2            5.

edited Mar 31 '18 at 12:21

answered Mar 31 '18 at 11:23

sbha

2,27222124

add a comment |

library(tidyverse)



# sample data frame with a tie

df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))



# If you aren't worried about ties:  

df %>% 

  rownames_to_column('id') %>%  # creates an ID number

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  slice(which.max(cnt)) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.





# If you're worried about keeping ties:

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works

  arrange(id)



# A tibble: 4 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.

4 3     V3       5.





# If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  mutate(dept_rank  = rank(-cnt, ties.method = "first")) %>% # or 'last'

  filter(dept_rank == 1) %>% 

  select(-dept_rank) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 2     V1       8.

2 3     V2       5.

3 1     V3       9.



# if you wanted to keep the original wide data frame

df %>% 

  rownames_to_column('id') %>%

  left_join(

    df %>% 

      rownames_to_column('id') %>%

      gather(max_dept, max_cnt, V1:V3) %>% 

      group_by(id) %>% 

      slice(which.max(max_cnt)), 

    by = 'id'

  )



# A tibble: 3 x 6

  id       V1    V2    V3 max_dept max_cnt

  <chr> <dbl> <dbl> <dbl> <chr>      <dbl>

1 1        2.    7.    9. V3            9.

2 2        8.    3.    6. V1            8.

3 3        1.    5.    5. V2            5.

edited Mar 31 '18 at 12:21

answered Mar 31 '18 at 11:23

sbha

2,27222124

add a comment |

library(tidyverse)



# sample data frame with a tie

df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))



# If you aren't worried about ties:  

df %>% 

  rownames_to_column('id') %>%  # creates an ID number

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  slice(which.max(cnt)) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.





# If you're worried about keeping ties:

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works

  arrange(id)



# A tibble: 4 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.

4 3     V3       5.





# If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  mutate(dept_rank  = rank(-cnt, ties.method = "first")) %>% # or 'last'

  filter(dept_rank == 1) %>% 

  select(-dept_rank) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 2     V1       8.

2 3     V2       5.

3 1     V3       9.



# if you wanted to keep the original wide data frame

df %>% 

  rownames_to_column('id') %>%

  left_join(

    df %>% 

      rownames_to_column('id') %>%

      gather(max_dept, max_cnt, V1:V3) %>% 

      group_by(id) %>% 

      slice(which.max(max_cnt)), 

    by = 'id'

  )



# A tibble: 3 x 6

  id       V1    V2    V3 max_dept max_cnt

  <chr> <dbl> <dbl> <dbl> <chr>      <dbl>

1 1        2.    7.    9. V3            9.

2 2        8.    3.    6. V1            8.

3 3        1.    5.    5. V2            5.

edited Mar 31 '18 at 12:21

answered Mar 31 '18 at 11:23

sbha

2,27222124

library(tidyverse)



# sample data frame with a tie

df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))



# If you aren't worried about ties:  

df %>% 

  rownames_to_column('id') %>%  # creates an ID number

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  slice(which.max(cnt)) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.





# If you're worried about keeping ties:

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works

  arrange(id)



# A tibble: 4 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 1     V3       9.

2 2     V1       8.

3 3     V2       5.

4 3     V3       5.





# If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'

df %>% 

  rownames_to_column('id') %>%

  gather(dept, cnt, V1:V3) %>% 

  group_by(id) %>% 

  mutate(dept_rank  = rank(-cnt, ties.method = "first")) %>% # or 'last'

  filter(dept_rank == 1) %>% 

  select(-dept_rank) 



# A tibble: 3 x 3

# Groups:   id [3]

  id    dept    cnt

  <chr> <chr> <dbl>

1 2     V1       8.

2 3     V2       5.

3 1     V3       9.



# if you wanted to keep the original wide data frame

df %>% 

  rownames_to_column('id') %>%

  left_join(

    df %>% 

      rownames_to_column('id') %>%

      gather(max_dept, max_cnt, V1:V3) %>% 

      group_by(id) %>% 

      slice(which.max(max_cnt)), 

    by = 'id'

  )



# A tibble: 3 x 6

  id       V1    V2    V3 max_dept max_cnt

  <chr> <dbl> <dbl> <dbl> <chr>      <dbl>

1 1        2.    7.    9. V3            9.

2 2        8.    3.    6. V1            8.

3 3        1.    5.    5. V2            5.

edited Mar 31 '18 at 12:21

answered Mar 31 '18 at 11:23

sbha

2,27222124

edited Mar 31 '18 at 12:21

answered Mar 31 '18 at 11:23

sbha

2,27222124

answered Mar 31 '18 at 11:23

sbha

2,27222124

answered Mar 31 '18 at 11:23

sbha

2,27222124

add a comment |

A simple for loop can also be handy:

> df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

> df

  V1 V2 V3

1  2  7  9

2  8  3  6

3  1  5  4

> df2<-data.frame()

> for (i in 1:nrow(df)){

+   df2[i,1]<-colnames(df[which.max(df[i,])])

+ }

> df2

  V1

1 V3

2 V1

3 V2

edited Jul 3 '18 at 5:11

answered Jul 3 '18 at 5:05

rar

6331515

add a comment |

A simple for loop can also be handy:

> df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

> df

  V1 V2 V3

1  2  7  9

2  8  3  6

3  1  5  4

> df2<-data.frame()

> for (i in 1:nrow(df)){

+   df2[i,1]<-colnames(df[which.max(df[i,])])

+ }

> df2

  V1

1 V3

2 V1

3 V2

edited Jul 3 '18 at 5:11

answered Jul 3 '18 at 5:05

rar

6331515

add a comment |

A simple for loop can also be handy:

> df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

> df

  V1 V2 V3

1  2  7  9

2  8  3  6

3  1  5  4

> df2<-data.frame()

> for (i in 1:nrow(df)){

+   df2[i,1]<-colnames(df[which.max(df[i,])])

+ }

> df2

  V1

1 V3

2 V1

3 V2

edited Jul 3 '18 at 5:11

answered Jul 3 '18 at 5:05

rar

6331515

A simple for loop can also be handy:

> df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

> df

  V1 V2 V3

1  2  7  9

2  8  3  6

3  1  5  4

> df2<-data.frame()

> for (i in 1:nrow(df)){

+   df2[i,1]<-colnames(df[which.max(df[i,])])

+ }

> df2

  V1

1 V3

2 V1

3 V2

edited Jul 3 '18 at 5:11

answered Jul 3 '18 at 5:05

rar

6331515

edited Jul 3 '18 at 5:11

answered Jul 3 '18 at 5:05

rar

6331515

answered Jul 3 '18 at 5:05

rar

6331515

answered Jul 3 '18 at 5:05

rar

6331515

add a comment |

A `dplyr` solution:

Idea:

add rowids as a column

reshape to long format

filter for max in each group

Code:

DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  filter(rank(-value) == 1)

Result:

# A tibble: 3 x 3

# Groups:   rowname [3]

  rowname column value

  <chr>   <chr>  <dbl>

1 2       V1         8

2 3       V2         5

3 1       V3         9

This approach can be easily extended to get the top n columns.
Example for n=2:

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  mutate(rk = rank(-value)) %>%

  filter(rk <= 2) %>% 

  arrange(rowname, rk)

Result:

# A tibble: 6 x 4

# Groups:   rowname [3]

  rowname column value    rk

  <chr>   <chr>  <dbl> <dbl>

1 1       V3         9     1

2 1       V2         7     2

3 2       V1         8     1

4 2       V3         6     2

5 3       V2         5     1

6 3       V3         4     2

answered Nov 6 '18 at 14:31

Gregor Sturm

753618

Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

– Gregor
Nov 6 '18 at 14:46

add a comment |

A `dplyr` solution:

Idea:

add rowids as a column

reshape to long format

filter for max in each group

Code:

DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  filter(rank(-value) == 1)

Result:

# A tibble: 3 x 3

# Groups:   rowname [3]

  rowname column value

  <chr>   <chr>  <dbl>

1 2       V1         8

2 3       V2         5

3 1       V3         9

This approach can be easily extended to get the top n columns.
Example for n=2:

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  mutate(rk = rank(-value)) %>%

  filter(rk <= 2) %>% 

  arrange(rowname, rk)

Result:

# A tibble: 6 x 4

# Groups:   rowname [3]

  rowname column value    rk

  <chr>   <chr>  <dbl> <dbl>

1 1       V3         9     1

2 1       V2         7     2

3 2       V1         8     1

4 2       V3         6     2

5 3       V2         5     1

6 3       V3         4     2

answered Nov 6 '18 at 14:31

Gregor Sturm

753618

Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

– Gregor
Nov 6 '18 at 14:46

add a comment |

A `dplyr` solution:

Idea:

add rowids as a column

reshape to long format

filter for max in each group

Code:

DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  filter(rank(-value) == 1)

Result:

# A tibble: 3 x 3

# Groups:   rowname [3]

  rowname column value

  <chr>   <chr>  <dbl>

1 2       V1         8

2 3       V2         5

3 1       V3         9

This approach can be easily extended to get the top n columns.
Example for n=2:

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  mutate(rk = rank(-value)) %>%

  filter(rk <= 2) %>% 

  arrange(rowname, rk)

Result:

# A tibble: 6 x 4

# Groups:   rowname [3]

  rowname column value    rk

  <chr>   <chr>  <dbl> <dbl>

1 1       V3         9     1

2 1       V2         7     2

3 2       V1         8     1

4 2       V3         6     2

5 3       V2         5     1

6 3       V3         4     2

answered Nov 6 '18 at 14:31

Gregor Sturm

753618

A `dplyr` solution:

Idea:

add rowids as a column

reshape to long format

filter for max in each group

Code:

DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  filter(rank(-value) == 1)

Result:

# A tibble: 3 x 3

# Groups:   rowname [3]

  rowname column value

  <chr>   <chr>  <dbl>

1 2       V1         8

2 3       V2         5

3 1       V3         9

This approach can be easily extended to get the top n columns.
Example for n=2:

DF %>% 

  rownames_to_column() %>%

  gather(column, value, -rowname) %>%

  group_by(rowname) %>% 

  mutate(rk = rank(-value)) %>%

  filter(rk <= 2) %>% 

  arrange(rowname, rk)

Result:

# A tibble: 6 x 4

# Groups:   rowname [3]

  rowname column value    rk

  <chr>   <chr>  <dbl> <dbl>

1 1       V3         9     1

2 1       V2         7     2

3 2       V1         8     1

4 2       V3         6     2

5 3       V2         5     1

6 3       V3         4     2

answered Nov 6 '18 at 14:31

Gregor Sturm

753618

answered Nov 6 '18 at 14:31

Gregor Sturm

753618

answered Nov 6 '18 at 14:31

Gregor Sturm

753618

answered Nov 6 '18 at 14:31

Gregor Sturm

753618

Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

– Gregor
Nov 6 '18 at 14:46

add a comment |

Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

– Gregor
Nov 6 '18 at 14:46

Could you comment on the difference between this approach and sbha's answer above? They look about the same to me.

– Gregor
Nov 6 '18 at 14:46

add a comment |

protected by zx8754 Jul 14 '17 at 13:16

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk