Replace all NA values for variable with one row equal to 0

Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.

I have a data.frame such as:

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))



df1



   id val

1   a  NA

2   a  NA

3   a  NA

4   a  NA

5   b   1

6   b   2

7   b   2

8   b   3

9   c  NA

10  c   2

11  c  NA

12  c   3

and I want to get rid of all the NA values (easy enough using e.g. filter() ) but make sure that if this removes all of one id value (in this case it removes every instance of "a") that one extra row is inserted of (e.g.) a = 0

so that:

obviously easy enough to do this in a roundabout way but I was wondering if there's a tidy/elegant way to do this. I thought tidyr::complete() might help but not entirely sure how to apply it to a case like this

I don't care about the order of the rows

Cheers!

edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear

edited Jan 6 at 14:56

asked Jan 3 at 12:43

Robert Hickman

192110

So you want to add rows with 0 only if all the values for particular id is 0?

– Ronak Shah
Jan 3 at 12:47

only if they're all NA for a particular id

– Robert Hickman
Jan 3 at 12:49

1

@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on this df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) ? Thanks to @VivekKalyanarangan for the data.

– markus
Jan 3 at 13:31

add a comment |

Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.

I have a data.frame such as:

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))



df1



   id val

1   a  NA

2   a  NA

3   a  NA

4   a  NA

5   b   1

6   b   2

7   b   2

8   b   3

9   c  NA

10  c   2

11  c  NA

12  c   3

so that:

I don't care about the order of the rows

Cheers!

edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear

edited Jan 6 at 14:56

asked Jan 3 at 12:43

Robert Hickman

192110

So you want to add rows with 0 only if all the values for particular id is 0?

– Ronak Shah
Jan 3 at 12:47

only if they're all NA for a particular id

– Robert Hickman
Jan 3 at 12:49

1

@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on this df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) ? Thanks to @VivekKalyanarangan for the data.

– markus
Jan 3 at 13:31

add a comment |

Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.

I have a data.frame such as:

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))



df1



   id val

1   a  NA

2   a  NA

3   a  NA

4   a  NA

5   b   1

6   b   2

7   b   2

8   b   3

9   c  NA

10  c   2

11  c  NA

12  c   3

so that:

I don't care about the order of the rows

Cheers!

edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear

edited Jan 6 at 14:56

asked Jan 3 at 12:43

Robert Hickman

192110

Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.

I have a data.frame such as:

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))



df1



   id val

1   a  NA

2   a  NA

3   a  NA

4   a  NA

5   b   1

6   b   2

7   b   2

8   b   3

9   c  NA

10  c   2

11  c  NA

12  c   3

so that:

I don't care about the order of the rows

Cheers!

edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear

r na complete

edited Jan 6 at 14:56

asked Jan 3 at 12:43

Robert Hickman

192110

edited Jan 6 at 14:56

asked Jan 3 at 12:43

Robert Hickman

192110

edited Jan 6 at 14:56

asked Jan 3 at 12:43

Robert Hickman

192110

asked Jan 3 at 12:43

Robert Hickman

192110

asked Jan 3 at 12:43

Robert Hickman

192110

So you want to add rows with 0 only if all the values for particular id is 0?

– Ronak Shah
Jan 3 at 12:47

only if they're all NA for a particular id

– Robert Hickman
Jan 3 at 12:49

1

@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on this df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) ? Thanks to @VivekKalyanarangan for the data.

– markus
Jan 3 at 13:31

add a comment |

So you want to add rows with 0 only if all the values for particular id is 0?

– Ronak Shah
Jan 3 at 12:47

only if they're all NA for a particular id

– Robert Hickman
Jan 3 at 12:49

1

@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on this df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) ? Thanks to @VivekKalyanarangan for the data.

– markus
Jan 3 at 13:31

So you want to add rows with 0 only if all the values for particular id is 0?

– Ronak Shah
Jan 3 at 12:47

only if they're all NA for a particular id

– Robert Hickman
Jan 3 at 12:49

@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on this df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) ? Thanks to @VivekKalyanarangan for the data.

– markus
Jan 3 at 13:31

add a comment |

9 Answers
9

active

oldest

votes

Another idea using dplyr,

library(dplyr)



df1 %>% 

 group_by(id) %>% 

 mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>% 

 na.omit()

which gives,

# A tibble: 5 x 2

# Groups:   id [2]

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

answered Jan 3 at 13:34

Sotos

31k51741

2

(+1) Seems like the most robust answer here. Would be marginally more concise using replace(val, all(is.na(val)) * 1, 0) instead of the ifelse(...).

– Mikko Marttila
Jan 3 at 14:30

@MikkoMarttila Good suggestion. I usually try and avoid ifelse in general

– Sotos
Jan 3 at 14:35

add a comment |

We may do

df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))

# A tibble: 5 x 2

# Groups:   id [2]

#   id      val

#   <fct> <dbl>

# 1 a         0

# 2 b         1

# 3 b         2

# 4 b         2

# 5 b         3

After grouping by id, if everything in val is NA, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit.

In a more readable format that would be

df1 %>% group_by(id) %>% 

  do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))

(Here I presume that you indeed want to get rid of all NA values; otherwise there is no need for na.omit.)

edited Jan 3 at 13:31

answered Jan 3 at 13:11

Julius Vainora

38.3k76786

1

@markus, right, I had assumed that that's the goal. Thanks!

– Julius Vainora
Jan 3 at 13:19

It looks like op wants to retain the first row and replace the val column of that row with 0 where all val is NA for a group. Check my ans pls. Agree with @markus, it does seem tricky

– Vivek Kalyanarangan
Jan 3 at 13:27

1

@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.

– Julius Vainora
Jan 3 at 13:29

add a comment |

df1[is.na(df1)] <- 0

df1[!(duplicated(df1$id) & df1$val == 0), ]



  id val

1  a   0

5  b   1

6  b   2

7  b   2

8  b   3

answered Jan 3 at 13:02

Adamm

961718

5

Would this work for ids that contain NAs and non-NAs? Try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))

– markus
Jan 3 at 13:05

I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]

– Robert Hickman
Jan 3 at 13:26

add a comment |

Base R option is to find groups with all NAs and transform them by changing their val to 0 and select only unique rows so that there is only one row per group. We rbind this dataframe with the groups which are !all_NA.

all_NA <- with(df1, ave(is.na(val), id, FUN = all))

rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])



#  id val

#1  a   0

#5  b   1

#6  b   2

#7  b   2

#8  b   3

dplyr option looks ugly but one way is to make two groups of dataframes one with groups of all NA values and other with groups of all non-NA values. For groups with all NA values we add row with it's id and val as 0 and bind this to the other group.

library(dplyr)



bind_rows(df1 %>%

            group_by(id) %>%

            filter(all(!is.na(val))), 

          df1 %>%

             group_by(id) %>%

             filter(all(is.na(val))) %>%

             ungroup() %>%

             summarise(id = unique(id), 

                       val = 0)) %>%

arrange(id)





#   id      val

#  <fct> <dbl>

#1  a         0

#2  b         1

#3  b         2

#4  b         2

#5  b         3

edited Jan 3 at 13:17

answered Jan 3 at 12:56

Ronak Shah

43.5k104266

add a comment |

Changed the df to make example more exhaustive -

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

library(dplyr)

df1 %>%

  group_by(id) %>%

  mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%

  mutate(val=ifelse(is.na(val)&case,0,val)) %>%

  filter( !(case&row_num!=1) ) %>%

  select(id, val)

Output

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

6 c        NA

7 c         2

8 c        NA

9 c         3

answered Jan 3 at 13:25

Vivek Kalyanarangan

5,1141829

add a comment |

Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:

df1 <- na.omit(df1)



df1 <- rbind(

  df1, 

  data.frame(

    id  = levels(df1$id)[!levels(df1$id) %in% df1$id], 

    val = 0)

  )

I do personally prefer the dplyr approach given by Sotos, as I don't like rbind-ing data.frames back together so it's a matter of taste, but this isn't unbearably complicated by my eye. It's easy enough to adapt to a character id column with a unique(df1$id) variable.

answered Jan 3 at 16:09

CriminallyVulgar

16615

add a comment |

Here is an option too:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  slice(4:nrow(.))

This gives:

Alternative:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  unique()

UPDATE based on other requirements:
Some users suggested to test on this dataframe. Of course this answer assumes you'll look at everything by hand. Might be less useful if you have to look at everything by "hand" but here goes:

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))





df1 %>% 

  mutate_if(is.factor,as.character) %>% 

  mutate(val=ifelse(id=="a",0,val)) %>% 

  slice(4:nrow(.))

This yields:

edited Jan 4 at 4:49

answered Jan 3 at 13:18

NelsonGon

3,4273833

3

where did 4 come from?

– Sotos
Jan 3 at 13:22

The solution produces four 0s. We're only interested in having 1?

– NelsonGon
Jan 3 at 13:23

What if one group has 4 and another 3?

– Sotos
Jan 3 at 13:26

Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!

– NelsonGon
Jan 3 at 13:27

Consider this example - df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) I think here OP wants to remove NA values for A group only, not the rest

– Vivek Kalyanarangan
Jan 3 at 13:28

|
show 2 more comments

Here is a base R solution.

res <- lapply(split(df1, df1$id), function(DF){

  if(anyNA(DF$val)) {

    i <- is.na(DF$val)

    DF$val[i] <- 0

    DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])

  }

  DF

})

res <- do.call(rbind, res)

row.names(res) <- NULL

res

#  id val

#1  a   0

#2  b   1

#3  b   2

#4  b   2

#5  b   3

Edit.

A dplyr solution could be the following.
It was tested with the original dataset posted by the OP, with the dataset in Vivek Kalyanarangan's answer and with the dataset in markus' comment, renamed df2 and df3, respectively.

library(dplyr)



na2zero <- function(DF){

  DF %>%

    group_by(id) %>%

    mutate(val = ifelse(is.na(val), 0, val),

           crit = val == 0 & duplicated(val)) %>%

    filter(!crit) %>%

    select(-crit)

}



na2zero(df1)

na2zero(df2)

na2zero(df3)

edited Jan 3 at 14:22

answered Jan 3 at 13:03

Rui Barradas

18.2k51833

Rui, try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3)). Unfortunately your solution doesn't return a data frame with only three rows.

– markus
Jan 3 at 13:21

@markus No, it doesn't. The NA is replaced by a 0 and the other value of val is not NA so both must be in the output. At least that's how I'm understanding the OP's problem.

– Rui Barradas
Jan 3 at 14:05

Fair enough. People are reading the question differently.

– markus
Jan 3 at 14:51

add a comment |

One may try this :

df1 = data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

df1

#   id val

#1   a  NA

#2   a  NA

#3   a  NA

#4   a  NA

#5   b   1

#6   b   2

#7   b   2

#8   b   3

#9   c  NA

#10  c   2

#11  c  NA

#12  c   3

Task is to remove all rows corresponding to any id IFF val for the corresponding id is all NAs and add new row with this id and val = 0.

In this example, id = a.

Note : val for c also has NAs but all the val corresponding to c are not NA therefore we need to remove the corresponding row for c where val = NA.

So lets create another column say, val2 which indicates 0 means its all NAs and 1 otherwise.

library(dplyr)



df1 = df1 %>% 

     group_by(id) %>%

     mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false =  1))

df1



# A tibble: 12 x 3

# Groups:   id [3]

#   id      val  val2

#   <fct> <dbl> <dbl>

#1 a        NA     0

#2 a        NA     0

#3 a        NA     0

#4 a        NA     0

#5 b         1     1

#6 b         2     1

#7 b         2     1

#8 b         3     1

#9 c        NA     1

#10 c        2     1

#11 c       NA     1

#12 c        3     1

Get the list of ids with corresponding val = NA for all.

all_na = unique(df1$id[df1$val2 == 0])

Then remove theids from the dataframe df1 with val = NA.

df1 = na.omit(df1)

df1

# A tibble: 6 x 3

# Groups:   id [2]

# id      val  val2

# <fct> <dbl> <dbl>

# 1 b         1     1

# 2 b         2     1

# 3 b         2     1

# 4 b         3     1

# 5 c         2     1

# 6 c         3     1

And create a new dataframe with ids in all_na and val = 0

all_na_df = data.frame(id = all_na, val = 0) 

all_na_df

# id val

# 1  a   0

then combine these two dataframes.

df1 = bind_rows(all_na_df, df1[,c('id', 'val')])

df1



#    id val

# 1  a   0

# 2  b   1

# 3  b   2

# 4  b   2

# 5  b   3

# 6  c   2

# 7  c   3

Hope this helps and Edits are most welcomed :-)

edited Jan 8 at 12:37

answered Jan 8 at 10:48

heisenbug47

7911

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54022536%2freplace-all-na-values-for-variable-with-one-row-equal-to-0%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

9 Answers
9

active

oldest

votes

9 Answers
9

active

oldest

votes

Another idea using dplyr,

library(dplyr)



df1 %>% 

 group_by(id) %>% 

 mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>% 

 na.omit()

which gives,

# A tibble: 5 x 2

# Groups:   id [2]

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

answered Jan 3 at 13:34

Sotos

31k51741

2

(+1) Seems like the most robust answer here. Would be marginally more concise using replace(val, all(is.na(val)) * 1, 0) instead of the ifelse(...).

– Mikko Marttila
Jan 3 at 14:30

@MikkoMarttila Good suggestion. I usually try and avoid ifelse in general

– Sotos
Jan 3 at 14:35

add a comment |

Another idea using dplyr,

library(dplyr)



df1 %>% 

 group_by(id) %>% 

 mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>% 

 na.omit()

which gives,

# A tibble: 5 x 2

# Groups:   id [2]

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

answered Jan 3 at 13:34

Sotos

31k51741

2

(+1) Seems like the most robust answer here. Would be marginally more concise using replace(val, all(is.na(val)) * 1, 0) instead of the ifelse(...).

– Mikko Marttila
Jan 3 at 14:30

@MikkoMarttila Good suggestion. I usually try and avoid ifelse in general

– Sotos
Jan 3 at 14:35

add a comment |

Another idea using dplyr,

library(dplyr)



df1 %>% 

 group_by(id) %>% 

 mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>% 

 na.omit()

which gives,

# A tibble: 5 x 2

# Groups:   id [2]

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

answered Jan 3 at 13:34

Sotos

31k51741

Another idea using dplyr,

library(dplyr)



df1 %>% 

 group_by(id) %>% 

 mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>% 

 na.omit()

which gives,

# A tibble: 5 x 2

# Groups:   id [2]

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

answered Jan 3 at 13:34

Sotos

31k51741

answered Jan 3 at 13:34

Sotos

31k51741

answered Jan 3 at 13:34

Sotos

31k51741

answered Jan 3 at 13:34

Sotos

31k51741

2

(+1) Seems like the most robust answer here. Would be marginally more concise using replace(val, all(is.na(val)) * 1, 0) instead of the ifelse(...).

– Mikko Marttila
Jan 3 at 14:30

@MikkoMarttila Good suggestion. I usually try and avoid ifelse in general

– Sotos
Jan 3 at 14:35

add a comment |

2

(+1) Seems like the most robust answer here. Would be marginally more concise using replace(val, all(is.na(val)) * 1, 0) instead of the ifelse(...).

– Mikko Marttila
Jan 3 at 14:30

@MikkoMarttila Good suggestion. I usually try and avoid ifelse in general

– Sotos
Jan 3 at 14:35

(+1) Seems like the most robust answer here. Would be marginally more concise using replace(val, all(is.na(val)) * 1, 0) instead of the ifelse(...).

– Mikko Marttila
Jan 3 at 14:30

@MikkoMarttila Good suggestion. I usually try and avoid ifelse in general

– Sotos
Jan 3 at 14:35

add a comment |

We may do

df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))

# A tibble: 5 x 2

# Groups:   id [2]

#   id      val

#   <fct> <dbl>

# 1 a         0

# 2 b         1

# 3 b         2

# 4 b         2

# 5 b         3

After grouping by id, if everything in val is NA, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit.

In a more readable format that would be

df1 %>% group_by(id) %>% 

  do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))

(Here I presume that you indeed want to get rid of all NA values; otherwise there is no need for na.omit.)

edited Jan 3 at 13:31

answered Jan 3 at 13:11

Julius Vainora

38.3k76786

1

@markus, right, I had assumed that that's the goal. Thanks!

– Julius Vainora
Jan 3 at 13:19

It looks like op wants to retain the first row and replace the val column of that row with 0 where all val is NA for a group. Check my ans pls. Agree with @markus, it does seem tricky

– Vivek Kalyanarangan
Jan 3 at 13:27

1

@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.

– Julius Vainora
Jan 3 at 13:29

add a comment |

We may do

df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))

# A tibble: 5 x 2

# Groups:   id [2]

#   id      val

#   <fct> <dbl>

# 1 a         0

# 2 b         1

# 3 b         2

# 4 b         2

# 5 b         3

After grouping by id, if everything in val is NA, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit.

In a more readable format that would be

df1 %>% group_by(id) %>% 

  do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))

(Here I presume that you indeed want to get rid of all NA values; otherwise there is no need for na.omit.)

edited Jan 3 at 13:31

answered Jan 3 at 13:11

Julius Vainora

38.3k76786

1

@markus, right, I had assumed that that's the goal. Thanks!

– Julius Vainora
Jan 3 at 13:19

It looks like op wants to retain the first row and replace the val column of that row with 0 where all val is NA for a group. Check my ans pls. Agree with @markus, it does seem tricky

– Vivek Kalyanarangan
Jan 3 at 13:27

1

@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.

– Julius Vainora
Jan 3 at 13:29

add a comment |

We may do

df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))

# A tibble: 5 x 2

# Groups:   id [2]

#   id      val

#   <fct> <dbl>

# 1 a         0

# 2 b         1

# 3 b         2

# 4 b         2

# 5 b         3

After grouping by id, if everything in val is NA, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit.

In a more readable format that would be

df1 %>% group_by(id) %>% 

  do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))

(Here I presume that you indeed want to get rid of all NA values; otherwise there is no need for na.omit.)

edited Jan 3 at 13:31

answered Jan 3 at 13:11

Julius Vainora

38.3k76786

We may do

df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))

# A tibble: 5 x 2

# Groups:   id [2]

#   id      val

#   <fct> <dbl>

# 1 a         0

# 2 b         1

# 3 b         2

# 4 b         2

# 5 b         3

After grouping by id, if everything in val is NA, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit.

In a more readable format that would be

df1 %>% group_by(id) %>% 

  do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))

(Here I presume that you indeed want to get rid of all NA values; otherwise there is no need for na.omit.)

edited Jan 3 at 13:31

answered Jan 3 at 13:11

Julius Vainora

38.3k76786

edited Jan 3 at 13:31

answered Jan 3 at 13:11

Julius Vainora

38.3k76786

answered Jan 3 at 13:11

Julius Vainora

38.3k76786

answered Jan 3 at 13:11

Julius Vainora

38.3k76786

1

@markus, right, I had assumed that that's the goal. Thanks!

– Julius Vainora
Jan 3 at 13:19

It looks like op wants to retain the first row and replace the val column of that row with 0 where all val is NA for a group. Check my ans pls. Agree with @markus, it does seem tricky

– Vivek Kalyanarangan
Jan 3 at 13:27

1

@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.

– Julius Vainora
Jan 3 at 13:29

add a comment |

1

@markus, right, I had assumed that that's the goal. Thanks!

– Julius Vainora
Jan 3 at 13:19

It looks like op wants to retain the first row and replace the val column of that row with 0 where all val is NA for a group. Check my ans pls. Agree with @markus, it does seem tricky

– Vivek Kalyanarangan
Jan 3 at 13:27

1

@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.

– Julius Vainora
Jan 3 at 13:29

@markus, right, I had assumed that that's the goal. Thanks!

– Julius Vainora
Jan 3 at 13:19

It looks like op wants to retain the first row and replace the val column of that row with 0 where all val is NA for a group. Check my ans pls. Agree with @markus, it does seem tricky

– Vivek Kalyanarangan
Jan 3 at 13:27

@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.

– Julius Vainora
Jan 3 at 13:29

add a comment |

df1[is.na(df1)] <- 0

df1[!(duplicated(df1$id) & df1$val == 0), ]



  id val

1  a   0

5  b   1

6  b   2

7  b   2

8  b   3

answered Jan 3 at 13:02

Adamm

961718

5

Would this work for ids that contain NAs and non-NAs? Try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))

– markus
Jan 3 at 13:05

I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]

– Robert Hickman
Jan 3 at 13:26

add a comment |

df1[is.na(df1)] <- 0

df1[!(duplicated(df1$id) & df1$val == 0), ]



  id val

1  a   0

5  b   1

6  b   2

7  b   2

8  b   3

answered Jan 3 at 13:02

Adamm

961718

5

Would this work for ids that contain NAs and non-NAs? Try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))

– markus
Jan 3 at 13:05

I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]

– Robert Hickman
Jan 3 at 13:26

add a comment |

df1[is.na(df1)] <- 0

df1[!(duplicated(df1$id) & df1$val == 0), ]



  id val

1  a   0

5  b   1

6  b   2

7  b   2

8  b   3

answered Jan 3 at 13:02

Adamm

961718

df1[is.na(df1)] <- 0

df1[!(duplicated(df1$id) & df1$val == 0), ]



  id val

1  a   0

5  b   1

6  b   2

7  b   2

8  b   3

answered Jan 3 at 13:02

Adamm

961718

answered Jan 3 at 13:02

Adamm

961718

answered Jan 3 at 13:02

Adamm

961718

answered Jan 3 at 13:02

Adamm

961718

5

Would this work for ids that contain NAs and non-NAs? Try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))

– markus
Jan 3 at 13:05

I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]

– Robert Hickman
Jan 3 at 13:26

add a comment |

5

Would this work for ids that contain NAs and non-NAs? Try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))

– markus
Jan 3 at 13:05

I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]

– Robert Hickman
Jan 3 at 13:26

Would this work for ids that contain NAs and non-NAs? Try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))

– markus
Jan 3 at 13:05

I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]

– Robert Hickman
Jan 3 at 13:26

add a comment |

all_NA <- with(df1, ave(is.na(val), id, FUN = all))

rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])



#  id val

#1  a   0

#5  b   1

#6  b   2

#7  b   2

#8  b   3

library(dplyr)



bind_rows(df1 %>%

            group_by(id) %>%

            filter(all(!is.na(val))), 

          df1 %>%

             group_by(id) %>%

             filter(all(is.na(val))) %>%

             ungroup() %>%

             summarise(id = unique(id), 

                       val = 0)) %>%

arrange(id)





#   id      val

#  <fct> <dbl>

#1  a         0

#2  b         1

#3  b         2

#4  b         2

#5  b         3

edited Jan 3 at 13:17

answered Jan 3 at 12:56

Ronak Shah

43.5k104266

add a comment |

all_NA <- with(df1, ave(is.na(val), id, FUN = all))

rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])



#  id val

#1  a   0

#5  b   1

#6  b   2

#7  b   2

#8  b   3

library(dplyr)



bind_rows(df1 %>%

            group_by(id) %>%

            filter(all(!is.na(val))), 

          df1 %>%

             group_by(id) %>%

             filter(all(is.na(val))) %>%

             ungroup() %>%

             summarise(id = unique(id), 

                       val = 0)) %>%

arrange(id)





#   id      val

#  <fct> <dbl>

#1  a         0

#2  b         1

#3  b         2

#4  b         2

#5  b         3

edited Jan 3 at 13:17

answered Jan 3 at 12:56

Ronak Shah

43.5k104266

add a comment |

all_NA <- with(df1, ave(is.na(val), id, FUN = all))

rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])



#  id val

#1  a   0

#5  b   1

#6  b   2

#7  b   2

#8  b   3

library(dplyr)



bind_rows(df1 %>%

            group_by(id) %>%

            filter(all(!is.na(val))), 

          df1 %>%

             group_by(id) %>%

             filter(all(is.na(val))) %>%

             ungroup() %>%

             summarise(id = unique(id), 

                       val = 0)) %>%

arrange(id)





#   id      val

#  <fct> <dbl>

#1  a         0

#2  b         1

#3  b         2

#4  b         2

#5  b         3

edited Jan 3 at 13:17

answered Jan 3 at 12:56

Ronak Shah

43.5k104266

all_NA <- with(df1, ave(is.na(val), id, FUN = all))

rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])



#  id val

#1  a   0

#5  b   1

#6  b   2

#7  b   2

#8  b   3

library(dplyr)



bind_rows(df1 %>%

            group_by(id) %>%

            filter(all(!is.na(val))), 

          df1 %>%

             group_by(id) %>%

             filter(all(is.na(val))) %>%

             ungroup() %>%

             summarise(id = unique(id), 

                       val = 0)) %>%

arrange(id)





#   id      val

#  <fct> <dbl>

#1  a         0

#2  b         1

#3  b         2

#4  b         2

#5  b         3

edited Jan 3 at 13:17

answered Jan 3 at 12:56

Ronak Shah

43.5k104266

edited Jan 3 at 13:17

answered Jan 3 at 12:56

Ronak Shah

43.5k104266

answered Jan 3 at 12:56

Ronak Shah

43.5k104266

answered Jan 3 at 12:56

Ronak Shah

43.5k104266

add a comment |

Changed the df to make example more exhaustive -

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

library(dplyr)

df1 %>%

  group_by(id) %>%

  mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%

  mutate(val=ifelse(is.na(val)&case,0,val)) %>%

  filter( !(case&row_num!=1) ) %>%

  select(id, val)

Output

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

6 c        NA

7 c         2

8 c        NA

9 c         3

answered Jan 3 at 13:25

Vivek Kalyanarangan

5,1141829

add a comment |

Changed the df to make example more exhaustive -

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

library(dplyr)

df1 %>%

  group_by(id) %>%

  mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%

  mutate(val=ifelse(is.na(val)&case,0,val)) %>%

  filter( !(case&row_num!=1) ) %>%

  select(id, val)

Output

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

6 c        NA

7 c         2

8 c        NA

9 c         3

answered Jan 3 at 13:25

Vivek Kalyanarangan

5,1141829

add a comment |

Changed the df to make example more exhaustive -

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

library(dplyr)

df1 %>%

  group_by(id) %>%

  mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%

  mutate(val=ifelse(is.na(val)&case,0,val)) %>%

  filter( !(case&row_num!=1) ) %>%

  select(id, val)

Output

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

6 c        NA

7 c         2

8 c        NA

9 c         3

answered Jan 3 at 13:25

Vivek Kalyanarangan

5,1141829

Changed the df to make example more exhaustive -

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

library(dplyr)

df1 %>%

  group_by(id) %>%

  mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%

  mutate(val=ifelse(is.na(val)&case,0,val)) %>%

  filter( !(case&row_num!=1) ) %>%

  select(id, val)

Output

  id      val

  <fct> <dbl>

1 a         0

2 b         1

3 b         2

4 b         2

5 b         3

6 c        NA

7 c         2

8 c        NA

9 c         3

answered Jan 3 at 13:25

Vivek Kalyanarangan

5,1141829

answered Jan 3 at 13:25

Vivek Kalyanarangan

5,1141829

answered Jan 3 at 13:25

Vivek Kalyanarangan

5,1141829

answered Jan 3 at 13:25

Vivek Kalyanarangan

5,1141829

add a comment |

Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:

df1 <- na.omit(df1)



df1 <- rbind(

  df1, 

  data.frame(

    id  = levels(df1$id)[!levels(df1$id) %in% df1$id], 

    val = 0)

  )

answered Jan 3 at 16:09

CriminallyVulgar

16615

add a comment |

Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:

df1 <- na.omit(df1)



df1 <- rbind(

  df1, 

  data.frame(

    id  = levels(df1$id)[!levels(df1$id) %in% df1$id], 

    val = 0)

  )

answered Jan 3 at 16:09

CriminallyVulgar

16615

add a comment |

Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:

df1 <- na.omit(df1)



df1 <- rbind(

  df1, 

  data.frame(

    id  = levels(df1$id)[!levels(df1$id) %in% df1$id], 

    val = 0)

  )

answered Jan 3 at 16:09

CriminallyVulgar

16615

Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:

df1 <- na.omit(df1)



df1 <- rbind(

  df1, 

  data.frame(

    id  = levels(df1$id)[!levels(df1$id) %in% df1$id], 

    val = 0)

  )

answered Jan 3 at 16:09

CriminallyVulgar

16615

answered Jan 3 at 16:09

CriminallyVulgar

16615

answered Jan 3 at 16:09

CriminallyVulgar

16615

answered Jan 3 at 16:09

CriminallyVulgar

16615

add a comment |

Here is an option too:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  slice(4:nrow(.))

This gives:

Alternative:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  unique()

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))





df1 %>% 

  mutate_if(is.factor,as.character) %>% 

  mutate(val=ifelse(id=="a",0,val)) %>% 

  slice(4:nrow(.))

This yields:

edited Jan 4 at 4:49

answered Jan 3 at 13:18

NelsonGon

3,4273833

3

where did 4 come from?

– Sotos
Jan 3 at 13:22

The solution produces four 0s. We're only interested in having 1?

– NelsonGon
Jan 3 at 13:23

What if one group has 4 and another 3?

– Sotos
Jan 3 at 13:26

Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!

– NelsonGon
Jan 3 at 13:27

Consider this example - df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) I think here OP wants to remove NA values for A group only, not the rest

– Vivek Kalyanarangan
Jan 3 at 13:28

|
show 2 more comments

Here is an option too:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  slice(4:nrow(.))

This gives:

Alternative:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  unique()

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))





df1 %>% 

  mutate_if(is.factor,as.character) %>% 

  mutate(val=ifelse(id=="a",0,val)) %>% 

  slice(4:nrow(.))

This yields:

edited Jan 4 at 4:49

answered Jan 3 at 13:18

NelsonGon

3,4273833

3

where did 4 come from?

– Sotos
Jan 3 at 13:22

The solution produces four 0s. We're only interested in having 1?

– NelsonGon
Jan 3 at 13:23

What if one group has 4 and another 3?

– Sotos
Jan 3 at 13:26

Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!

– NelsonGon
Jan 3 at 13:27

Consider this example - df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) I think here OP wants to remove NA values for A group only, not the rest

– Vivek Kalyanarangan
Jan 3 at 13:28

|
show 2 more comments

Here is an option too:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  slice(4:nrow(.))

This gives:

Alternative:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  unique()

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))





df1 %>% 

  mutate_if(is.factor,as.character) %>% 

  mutate(val=ifelse(id=="a",0,val)) %>% 

  slice(4:nrow(.))

This yields:

edited Jan 4 at 4:49

answered Jan 3 at 13:18

NelsonGon

3,4273833

Here is an option too:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  slice(4:nrow(.))

This gives:

Alternative:

df1 %>% 

  mutate_if(is.factor,as.character) %>% 

 mutate_all(funs(replace(.,is.na(.),0))) %>% 

  unique()

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))





df1 %>% 

  mutate_if(is.factor,as.character) %>% 

  mutate(val=ifelse(id=="a",0,val)) %>% 

  slice(4:nrow(.))

This yields:

edited Jan 4 at 4:49

answered Jan 3 at 13:18

NelsonGon

3,4273833

edited Jan 4 at 4:49

answered Jan 3 at 13:18

NelsonGon

3,4273833

answered Jan 3 at 13:18

NelsonGon

3,4273833

answered Jan 3 at 13:18

NelsonGon

3,4273833

3

where did 4 come from?

– Sotos
Jan 3 at 13:22

The solution produces four 0s. We're only interested in having 1?

– NelsonGon
Jan 3 at 13:23

What if one group has 4 and another 3?

– Sotos
Jan 3 at 13:26

Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!

– NelsonGon
Jan 3 at 13:27

Consider this example - df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) I think here OP wants to remove NA values for A group only, not the rest

– Vivek Kalyanarangan
Jan 3 at 13:28

|
show 2 more comments

3

where did 4 come from?

– Sotos
Jan 3 at 13:22

The solution produces four 0s. We're only interested in having 1?

– NelsonGon
Jan 3 at 13:23

What if one group has 4 and another 3?

– Sotos
Jan 3 at 13:26

Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!

– NelsonGon
Jan 3 at 13:27

Consider this example - df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) I think here OP wants to remove NA values for A group only, not the rest

– Vivek Kalyanarangan
Jan 3 at 13:28

where did 4 come from?

– Sotos
Jan 3 at 13:22

The solution produces four 0s. We're only interested in having 1?

– NelsonGon
Jan 3 at 13:23

What if one group has 4 and another 3?

– Sotos
Jan 3 at 13:26

Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!

– NelsonGon
Jan 3 at 13:27

Consider this example - df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) I think here OP wants to remove NA values for A group only, not the rest

– Vivek Kalyanarangan
Jan 3 at 13:28

|
show 2 more comments

Here is a base R solution.

res <- lapply(split(df1, df1$id), function(DF){

  if(anyNA(DF$val)) {

    i <- is.na(DF$val)

    DF$val[i] <- 0

    DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])

  }

  DF

})

res <- do.call(rbind, res)

row.names(res) <- NULL

res

#  id val

#1  a   0

#2  b   1

#3  b   2

#4  b   2

#5  b   3

Edit.

library(dplyr)



na2zero <- function(DF){

  DF %>%

    group_by(id) %>%

    mutate(val = ifelse(is.na(val), 0, val),

           crit = val == 0 & duplicated(val)) %>%

    filter(!crit) %>%

    select(-crit)

}



na2zero(df1)

na2zero(df2)

na2zero(df3)

edited Jan 3 at 14:22

answered Jan 3 at 13:03

Rui Barradas

18.2k51833

Rui, try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3)). Unfortunately your solution doesn't return a data frame with only three rows.

– markus
Jan 3 at 13:21

@markus No, it doesn't. The NA is replaced by a 0 and the other value of val is not NA so both must be in the output. At least that's how I'm understanding the OP's problem.

– Rui Barradas
Jan 3 at 14:05

Fair enough. People are reading the question differently.

– markus
Jan 3 at 14:51

add a comment |

Here is a base R solution.

res <- lapply(split(df1, df1$id), function(DF){

  if(anyNA(DF$val)) {

    i <- is.na(DF$val)

    DF$val[i] <- 0

    DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])

  }

  DF

})

res <- do.call(rbind, res)

row.names(res) <- NULL

res

#  id val

#1  a   0

#2  b   1

#3  b   2

#4  b   2

#5  b   3

Edit.

library(dplyr)



na2zero <- function(DF){

  DF %>%

    group_by(id) %>%

    mutate(val = ifelse(is.na(val), 0, val),

           crit = val == 0 & duplicated(val)) %>%

    filter(!crit) %>%

    select(-crit)

}



na2zero(df1)

na2zero(df2)

na2zero(df3)

edited Jan 3 at 14:22

answered Jan 3 at 13:03

Rui Barradas

18.2k51833

Rui, try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3)). Unfortunately your solution doesn't return a data frame with only three rows.

– markus
Jan 3 at 13:21

@markus No, it doesn't. The NA is replaced by a 0 and the other value of val is not NA so both must be in the output. At least that's how I'm understanding the OP's problem.

– Rui Barradas
Jan 3 at 14:05

Fair enough. People are reading the question differently.

– markus
Jan 3 at 14:51

add a comment |

Here is a base R solution.

res <- lapply(split(df1, df1$id), function(DF){

  if(anyNA(DF$val)) {

    i <- is.na(DF$val)

    DF$val[i] <- 0

    DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])

  }

  DF

})

res <- do.call(rbind, res)

row.names(res) <- NULL

res

#  id val

#1  a   0

#2  b   1

#3  b   2

#4  b   2

#5  b   3

Edit.

library(dplyr)



na2zero <- function(DF){

  DF %>%

    group_by(id) %>%

    mutate(val = ifelse(is.na(val), 0, val),

           crit = val == 0 & duplicated(val)) %>%

    filter(!crit) %>%

    select(-crit)

}



na2zero(df1)

na2zero(df2)

na2zero(df3)

edited Jan 3 at 14:22

answered Jan 3 at 13:03

Rui Barradas

18.2k51833

Here is a base R solution.

res <- lapply(split(df1, df1$id), function(DF){

  if(anyNA(DF$val)) {

    i <- is.na(DF$val)

    DF$val[i] <- 0

    DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])

  }

  DF

})

res <- do.call(rbind, res)

row.names(res) <- NULL

res

#  id val

#1  a   0

#2  b   1

#3  b   2

#4  b   2

#5  b   3

Edit.

library(dplyr)



na2zero <- function(DF){

  DF %>%

    group_by(id) %>%

    mutate(val = ifelse(is.na(val), 0, val),

           crit = val == 0 & duplicated(val)) %>%

    filter(!crit) %>%

    select(-crit)

}



na2zero(df1)

na2zero(df2)

na2zero(df3)

edited Jan 3 at 14:22

answered Jan 3 at 13:03

Rui Barradas

18.2k51833

edited Jan 3 at 14:22

answered Jan 3 at 13:03

Rui Barradas

18.2k51833

answered Jan 3 at 13:03

Rui Barradas

18.2k51833

answered Jan 3 at 13:03

Rui Barradas

18.2k51833

Rui, try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3)). Unfortunately your solution doesn't return a data frame with only three rows.

– markus
Jan 3 at 13:21

@markus No, it doesn't. The NA is replaced by a 0 and the other value of val is not NA so both must be in the output. At least that's how I'm understanding the OP's problem.

– Rui Barradas
Jan 3 at 14:05

Fair enough. People are reading the question differently.

– markus
Jan 3 at 14:51

add a comment |

Rui, try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3)). Unfortunately your solution doesn't return a data frame with only three rows.

– markus
Jan 3 at 13:21

@markus No, it doesn't. The NA is replaced by a 0 and the other value of val is not NA so both must be in the output. At least that's how I'm understanding the OP's problem.

– Rui Barradas
Jan 3 at 14:05

Fair enough. People are reading the question differently.

– markus
Jan 3 at 14:51

Rui, try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3)). Unfortunately your solution doesn't return a data frame with only three rows.

– markus
Jan 3 at 13:21

@markus No, it doesn't. The NA is replaced by a 0 and the other value of val is not NA so both must be in the output. At least that's how I'm understanding the OP's problem.

– Rui Barradas
Jan 3 at 14:05

Fair enough. People are reading the question differently.

– markus
Jan 3 at 14:51

add a comment |

One may try this :

df1 = data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

df1

#   id val

#1   a  NA

#2   a  NA

#3   a  NA

#4   a  NA

#5   b   1

#6   b   2

#7   b   2

#8   b   3

#9   c  NA

#10  c   2

#11  c  NA

#12  c   3

Task is to remove all rows corresponding to any id IFF val for the corresponding id is all NAs and add new row with this id and val = 0.

In this example, id = a.

Note : val for c also has NAs but all the val corresponding to c are not NA therefore we need to remove the corresponding row for c where val = NA.

So lets create another column say, val2 which indicates 0 means its all NAs and 1 otherwise.

library(dplyr)



df1 = df1 %>% 

     group_by(id) %>%

     mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false =  1))

df1



# A tibble: 12 x 3

# Groups:   id [3]

#   id      val  val2

#   <fct> <dbl> <dbl>

#1 a        NA     0

#2 a        NA     0

#3 a        NA     0

#4 a        NA     0

#5 b         1     1

#6 b         2     1

#7 b         2     1

#8 b         3     1

#9 c        NA     1

#10 c        2     1

#11 c       NA     1

#12 c        3     1

Get the list of ids with corresponding val = NA for all.

all_na = unique(df1$id[df1$val2 == 0])

Then remove theids from the dataframe df1 with val = NA.

df1 = na.omit(df1)

df1

# A tibble: 6 x 3

# Groups:   id [2]

# id      val  val2

# <fct> <dbl> <dbl>

# 1 b         1     1

# 2 b         2     1

# 3 b         2     1

# 4 b         3     1

# 5 c         2     1

# 6 c         3     1

And create a new dataframe with ids in all_na and val = 0

all_na_df = data.frame(id = all_na, val = 0) 

all_na_df

# id val

# 1  a   0

then combine these two dataframes.

df1 = bind_rows(all_na_df, df1[,c('id', 'val')])

df1



#    id val

# 1  a   0

# 2  b   1

# 3  b   2

# 4  b   2

# 5  b   3

# 6  c   2

# 7  c   3

Hope this helps and Edits are most welcomed :-)

edited Jan 8 at 12:37

answered Jan 8 at 10:48

heisenbug47

7911

add a comment |

One may try this :

df1 = data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

df1

#   id val

#1   a  NA

#2   a  NA

#3   a  NA

#4   a  NA

#5   b   1

#6   b   2

#7   b   2

#8   b   3

#9   c  NA

#10  c   2

#11  c  NA

#12  c   3

Task is to remove all rows corresponding to any id IFF val for the corresponding id is all NAs and add new row with this id and val = 0.

In this example, id = a.

Note : val for c also has NAs but all the val corresponding to c are not NA therefore we need to remove the corresponding row for c where val = NA.

So lets create another column say, val2 which indicates 0 means its all NAs and 1 otherwise.

library(dplyr)



df1 = df1 %>% 

     group_by(id) %>%

     mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false =  1))

df1



# A tibble: 12 x 3

# Groups:   id [3]

#   id      val  val2

#   <fct> <dbl> <dbl>

#1 a        NA     0

#2 a        NA     0

#3 a        NA     0

#4 a        NA     0

#5 b         1     1

#6 b         2     1

#7 b         2     1

#8 b         3     1

#9 c        NA     1

#10 c        2     1

#11 c       NA     1

#12 c        3     1

Get the list of ids with corresponding val = NA for all.

all_na = unique(df1$id[df1$val2 == 0])

Then remove theids from the dataframe df1 with val = NA.

df1 = na.omit(df1)

df1

# A tibble: 6 x 3

# Groups:   id [2]

# id      val  val2

# <fct> <dbl> <dbl>

# 1 b         1     1

# 2 b         2     1

# 3 b         2     1

# 4 b         3     1

# 5 c         2     1

# 6 c         3     1

And create a new dataframe with ids in all_na and val = 0

all_na_df = data.frame(id = all_na, val = 0) 

all_na_df

# id val

# 1  a   0

then combine these two dataframes.

df1 = bind_rows(all_na_df, df1[,c('id', 'val')])

df1



#    id val

# 1  a   0

# 2  b   1

# 3  b   2

# 4  b   2

# 5  b   3

# 6  c   2

# 7  c   3

Hope this helps and Edits are most welcomed :-)

edited Jan 8 at 12:37

answered Jan 8 at 10:48

heisenbug47

7911

add a comment |

One may try this :

df1 = data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

df1

#   id val

#1   a  NA

#2   a  NA

#3   a  NA

#4   a  NA

#5   b   1

#6   b   2

#7   b   2

#8   b   3

#9   c  NA

#10  c   2

#11  c  NA

#12  c   3

Task is to remove all rows corresponding to any id IFF val for the corresponding id is all NAs and add new row with this id and val = 0.

In this example, id = a.

Note : val for c also has NAs but all the val corresponding to c are not NA therefore we need to remove the corresponding row for c where val = NA.

So lets create another column say, val2 which indicates 0 means its all NAs and 1 otherwise.

library(dplyr)



df1 = df1 %>% 

     group_by(id) %>%

     mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false =  1))

df1



# A tibble: 12 x 3

# Groups:   id [3]

#   id      val  val2

#   <fct> <dbl> <dbl>

#1 a        NA     0

#2 a        NA     0

#3 a        NA     0

#4 a        NA     0

#5 b         1     1

#6 b         2     1

#7 b         2     1

#8 b         3     1

#9 c        NA     1

#10 c        2     1

#11 c       NA     1

#12 c        3     1

Get the list of ids with corresponding val = NA for all.

all_na = unique(df1$id[df1$val2 == 0])

Then remove theids from the dataframe df1 with val = NA.

df1 = na.omit(df1)

df1

# A tibble: 6 x 3

# Groups:   id [2]

# id      val  val2

# <fct> <dbl> <dbl>

# 1 b         1     1

# 2 b         2     1

# 3 b         2     1

# 4 b         3     1

# 5 c         2     1

# 6 c         3     1

And create a new dataframe with ids in all_na and val = 0

all_na_df = data.frame(id = all_na, val = 0) 

all_na_df

# id val

# 1  a   0

then combine these two dataframes.

df1 = bind_rows(all_na_df, df1[,c('id', 'val')])

df1



#    id val

# 1  a   0

# 2  b   1

# 3  b   2

# 4  b   2

# 5  b   3

# 6  c   2

# 7  c   3

Hope this helps and Edits are most welcomed :-)

edited Jan 8 at 12:37

answered Jan 8 at 10:48

heisenbug47

7911

One may try this :

df1 = data.frame(id = rep(c("a", "b","c"), each = 4),

                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

df1

#   id val

#1   a  NA

#2   a  NA

#3   a  NA

#4   a  NA

#5   b   1

#6   b   2

#7   b   2

#8   b   3

#9   c  NA

#10  c   2

#11  c  NA

#12  c   3

Task is to remove all rows corresponding to any id IFF val for the corresponding id is all NAs and add new row with this id and val = 0.

In this example, id = a.

Note : val for c also has NAs but all the val corresponding to c are not NA therefore we need to remove the corresponding row for c where val = NA.

So lets create another column say, val2 which indicates 0 means its all NAs and 1 otherwise.

library(dplyr)



df1 = df1 %>% 

     group_by(id) %>%

     mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false =  1))

df1



# A tibble: 12 x 3

# Groups:   id [3]

#   id      val  val2

#   <fct> <dbl> <dbl>

#1 a        NA     0

#2 a        NA     0

#3 a        NA     0

#4 a        NA     0

#5 b         1     1

#6 b         2     1

#7 b         2     1

#8 b         3     1

#9 c        NA     1

#10 c        2     1

#11 c       NA     1

#12 c        3     1

Get the list of ids with corresponding val = NA for all.

all_na = unique(df1$id[df1$val2 == 0])

Then remove theids from the dataframe df1 with val = NA.

df1 = na.omit(df1)

df1

# A tibble: 6 x 3

# Groups:   id [2]

# id      val  val2

# <fct> <dbl> <dbl>

# 1 b         1     1

# 2 b         2     1

# 3 b         2     1

# 4 b         3     1

# 5 c         2     1

# 6 c         3     1

And create a new dataframe with ids in all_na and val = 0

all_na_df = data.frame(id = all_na, val = 0) 

all_na_df

# id val

# 1  a   0

then combine these two dataframes.

df1 = bind_rows(all_na_df, df1[,c('id', 'val')])

df1



#    id val

# 1  a   0

# 2  b   1

# 3  b   2

# 4  b   2

# 5  b   3

# 6  c   2

# 7  c   3

Hope this helps and Edits are most welcomed :-)

edited Jan 8 at 12:37

answered Jan 8 at 10:48

heisenbug47

7911

edited Jan 8 at 12:37

answered Jan 8 at 10:48

heisenbug47

7911

answered Jan 8 at 10:48

heisenbug47

7911

answered Jan 8 at 10:48

heisenbug47

7911

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk