Replace all NA values for variable with one row equal to 0
Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.
I have a data.frame such as:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
id val
1 a NA
2 a NA
3 a NA
4 a NA
5 b 1
6 b 2
7 b 2
8 b 3
9 c NA
10 c 2
11 c NA
12 c 3
and I want to get rid of all the NA values (easy enough using e.g. filter() ) but make sure that if this removes all of one id value (in this case it removes every instance of "a") that one extra row is inserted of (e.g.) a = 0
so that:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c 2
7 c 3
obviously easy enough to do this in a roundabout way but I was wondering if there's a tidy/elegant way to do this. I thought tidyr::complete() might help but not entirely sure how to apply it to a case like this
I don't care about the order of the rows
Cheers!
edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear
r na complete
add a comment |
Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.
I have a data.frame such as:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
id val
1 a NA
2 a NA
3 a NA
4 a NA
5 b 1
6 b 2
7 b 2
8 b 3
9 c NA
10 c 2
11 c NA
12 c 3
and I want to get rid of all the NA values (easy enough using e.g. filter() ) but make sure that if this removes all of one id value (in this case it removes every instance of "a") that one extra row is inserted of (e.g.) a = 0
so that:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c 2
7 c 3
obviously easy enough to do this in a roundabout way but I was wondering if there's a tidy/elegant way to do this. I thought tidyr::complete() might help but not entirely sure how to apply it to a case like this
I don't care about the order of the rows
Cheers!
edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear
r na complete
So you want to add rows with 0 only if all the values for particularid
is 0?
– Ronak Shah
Jan 3 at 12:47
only if they're all NA for a particular id
– Robert Hickman
Jan 3 at 12:49
1
@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on thisdf1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
? Thanks to @VivekKalyanarangan for the data.
– markus
Jan 3 at 13:31
add a comment |
Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.
I have a data.frame such as:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
id val
1 a NA
2 a NA
3 a NA
4 a NA
5 b 1
6 b 2
7 b 2
8 b 3
9 c NA
10 c 2
11 c NA
12 c 3
and I want to get rid of all the NA values (easy enough using e.g. filter() ) but make sure that if this removes all of one id value (in this case it removes every instance of "a") that one extra row is inserted of (e.g.) a = 0
so that:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c 2
7 c 3
obviously easy enough to do this in a roundabout way but I was wondering if there's a tidy/elegant way to do this. I thought tidyr::complete() might help but not entirely sure how to apply it to a case like this
I don't care about the order of the rows
Cheers!
edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear
r na complete
Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.
I have a data.frame such as:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
id val
1 a NA
2 a NA
3 a NA
4 a NA
5 b 1
6 b 2
7 b 2
8 b 3
9 c NA
10 c 2
11 c NA
12 c 3
and I want to get rid of all the NA values (easy enough using e.g. filter() ) but make sure that if this removes all of one id value (in this case it removes every instance of "a") that one extra row is inserted of (e.g.) a = 0
so that:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c 2
7 c 3
obviously easy enough to do this in a roundabout way but I was wondering if there's a tidy/elegant way to do this. I thought tidyr::complete() might help but not entirely sure how to apply it to a case like this
I don't care about the order of the rows
Cheers!
edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear
r na complete
r na complete
edited Jan 6 at 14:56
Robert Hickman
asked Jan 3 at 12:43
Robert HickmanRobert Hickman
192110
192110
So you want to add rows with 0 only if all the values for particularid
is 0?
– Ronak Shah
Jan 3 at 12:47
only if they're all NA for a particular id
– Robert Hickman
Jan 3 at 12:49
1
@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on thisdf1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
? Thanks to @VivekKalyanarangan for the data.
– markus
Jan 3 at 13:31
add a comment |
So you want to add rows with 0 only if all the values for particularid
is 0?
– Ronak Shah
Jan 3 at 12:47
only if they're all NA for a particular id
– Robert Hickman
Jan 3 at 12:49
1
@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on thisdf1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
? Thanks to @VivekKalyanarangan for the data.
– markus
Jan 3 at 13:31
So you want to add rows with 0 only if all the values for particular
id
is 0?– Ronak Shah
Jan 3 at 12:47
So you want to add rows with 0 only if all the values for particular
id
is 0?– Ronak Shah
Jan 3 at 12:47
only if they're all NA for a particular id
– Robert Hickman
Jan 3 at 12:49
only if they're all NA for a particular id
– Robert Hickman
Jan 3 at 12:49
1
1
@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on this
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
? Thanks to @VivekKalyanarangan for the data.– markus
Jan 3 at 13:31
@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on this
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
? Thanks to @VivekKalyanarangan for the data.– markus
Jan 3 at 13:31
add a comment |
9 Answers
9
active
oldest
votes
Another idea using dplyr
,
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>%
na.omit()
which gives,
# A tibble: 5 x 2
# Groups: id [2]
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
2
(+1) Seems like the most robust answer here. Would be marginally more concise usingreplace(val, all(is.na(val)) * 1, 0)
instead of theifelse(...)
.
– Mikko Marttila
Jan 3 at 14:30
@MikkoMarttila Good suggestion. I usually try and avoidifelse
in general
– Sotos
Jan 3 at 14:35
add a comment |
We may do
df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))
# A tibble: 5 x 2
# Groups: id [2]
# id val
# <fct> <dbl>
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
After grouping by id
, if everything in val
is NA
, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit
.
In a more readable format that would be
df1 %>% group_by(id) %>%
do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))
(Here I presume that you indeed want to get rid of all NA
values; otherwise there is no need for na.omit
.)
1
@markus, right, I had assumed that that's the goal. Thanks!
– Julius Vainora
Jan 3 at 13:19
It looks like op wants to retain the first row and replace theval
column of that row with0
where allval
isNA
for a group. Check my ans pls. Agree with @markus, it does seem tricky
– Vivek Kalyanarangan
Jan 3 at 13:27
1
@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.
– Julius Vainora
Jan 3 at 13:29
add a comment |
df1[is.na(df1)] <- 0
df1[!(duplicated(df1$id) & df1$val == 0), ]
id val
1 a 0
5 b 1
6 b 2
7 b 2
8 b 3
5
Would this work forid
s that containNA
s and non-NA
s? Try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
– markus
Jan 3 at 13:05
I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]
– Robert Hickman
Jan 3 at 13:26
add a comment |
Base R option is to find groups with all NA
s and transform
them by changing their val
to 0 and select only unique
rows so that there is only one row per group. We rbind
this dataframe with the groups which are !all_NA
.
all_NA <- with(df1, ave(is.na(val), id, FUN = all))
rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])
# id val
#1 a 0
#5 b 1
#6 b 2
#7 b 2
#8 b 3
dplyr
option looks ugly but one way is to make two groups of dataframes one with groups of all NA
values and other with groups of all non-NA values. For groups with all NA
values we add row with it's id
and val
as 0 and bind this to the other group.
library(dplyr)
bind_rows(df1 %>%
group_by(id) %>%
filter(all(!is.na(val))),
df1 %>%
group_by(id) %>%
filter(all(is.na(val))) %>%
ungroup() %>%
summarise(id = unique(id),
val = 0)) %>%
arrange(id)
# id val
# <fct> <dbl>
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
add a comment |
Changed the df
to make example more exhaustive -
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%
mutate(val=ifelse(is.na(val)&case,0,val)) %>%
filter( !(case&row_num!=1) ) %>%
select(id, val)
Output
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
add a comment |
Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:
df1 <- na.omit(df1)
df1 <- rbind(
df1,
data.frame(
id = levels(df1$id)[!levels(df1$id) %in% df1$id],
val = 0)
)
I do personally prefer the dplyr approach given by Sotos, as I don't like rbind
-ing data.frames back together so it's a matter of taste, but this isn't unbearably complicated by my eye. It's easy enough to adapt to a character id
column with a unique(df1$id)
variable.
add a comment |
Here is an option too:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
slice(4:nrow(.))
This gives:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
Alternative:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
unique()
UPDATE based on other requirements:
Some users suggested to test on this dataframe. Of course this answer assumes you'll look at everything by hand. Might be less useful if you have to look at everything by "hand" but here goes:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate(val=ifelse(id=="a",0,val)) %>%
slice(4:nrow(.))
This yields:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
3
where did 4 come from?
– Sotos
Jan 3 at 13:22
The solution produces four 0s. We're only interested in having 1?
– NelsonGon
Jan 3 at 13:23
What if one group has 4 and another 3?
– Sotos
Jan 3 at 13:26
Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!
– NelsonGon
Jan 3 at 13:27
Consider this example -df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
I think here OP wants to removeNA
values forA
group only, not the rest
– Vivek Kalyanarangan
Jan 3 at 13:28
|
show 2 more comments
Here is a base R solution.
res <- lapply(split(df1, df1$id), function(DF){
if(anyNA(DF$val)) {
i <- is.na(DF$val)
DF$val[i] <- 0
DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])
}
DF
})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
# id val
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
Edit.
A dplyr
solution could be the following.
It was tested with the original dataset posted by the OP, with the dataset in Vivek Kalyanarangan's answer and with the dataset in markus' comment, renamed df2
and df3
, respectively.
library(dplyr)
na2zero <- function(DF){
DF %>%
group_by(id) %>%
mutate(val = ifelse(is.na(val), 0, val),
crit = val == 0 & duplicated(val)) %>%
filter(!crit) %>%
select(-crit)
}
na2zero(df1)
na2zero(df2)
na2zero(df3)
Rui, try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
. Unfortunately your solution doesn't return a data frame with only three rows.
– markus
Jan 3 at 13:21
@markus No, it doesn't. TheNA
is replaced by a0
and the other value ofval
is notNA
so both must be in the output. At least that's how I'm understanding the OP's problem.
– Rui Barradas
Jan 3 at 14:05
Fair enough. People are reading the question differently.
– markus
Jan 3 at 14:51
add a comment |
One may try this :
df1 = data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
# id val
#1 a NA
#2 a NA
#3 a NA
#4 a NA
#5 b 1
#6 b 2
#7 b 2
#8 b 3
#9 c NA
#10 c 2
#11 c NA
#12 c 3
Task is to remove all rows corresponding to any id
IFF val
for the corresponding id
is all NA
s and add new row with this id
and val = 0
.
In this example, id = a
.
Note : val
for c
also has NA
s but all the val
corresponding to c
are not NA
therefore we need to remove the corresponding row for c
where val = NA
.
So lets create another column say, val2
which indicates 0
means its all NA
s and 1 otherwise.
library(dplyr)
df1 = df1 %>%
group_by(id) %>%
mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false = 1))
df1
# A tibble: 12 x 3
# Groups: id [3]
# id val val2
# <fct> <dbl> <dbl>
#1 a NA 0
#2 a NA 0
#3 a NA 0
#4 a NA 0
#5 b 1 1
#6 b 2 1
#7 b 2 1
#8 b 3 1
#9 c NA 1
#10 c 2 1
#11 c NA 1
#12 c 3 1
Get the list of id
s with corresponding val = NA
for all.
all_na = unique(df1$id[df1$val2 == 0])
Then remove theid
s from the dataframe df1
with val = NA
.
df1 = na.omit(df1)
df1
# A tibble: 6 x 3
# Groups: id [2]
# id val val2
# <fct> <dbl> <dbl>
# 1 b 1 1
# 2 b 2 1
# 3 b 2 1
# 4 b 3 1
# 5 c 2 1
# 6 c 3 1
And create a new dataframe with id
s in all_na
and val = 0
all_na_df = data.frame(id = all_na, val = 0)
all_na_df
# id val
# 1 a 0
then combine these two dataframes.
df1 = bind_rows(all_na_df, df1[,c('id', 'val')])
df1
# id val
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
# 6 c 2
# 7 c 3
Hope this helps and Edits are most welcomed :-)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54022536%2freplace-all-na-values-for-variable-with-one-row-equal-to-0%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
9 Answers
9
active
oldest
votes
9 Answers
9
active
oldest
votes
active
oldest
votes
active
oldest
votes
Another idea using dplyr
,
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>%
na.omit()
which gives,
# A tibble: 5 x 2
# Groups: id [2]
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
2
(+1) Seems like the most robust answer here. Would be marginally more concise usingreplace(val, all(is.na(val)) * 1, 0)
instead of theifelse(...)
.
– Mikko Marttila
Jan 3 at 14:30
@MikkoMarttila Good suggestion. I usually try and avoidifelse
in general
– Sotos
Jan 3 at 14:35
add a comment |
Another idea using dplyr
,
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>%
na.omit()
which gives,
# A tibble: 5 x 2
# Groups: id [2]
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
2
(+1) Seems like the most robust answer here. Would be marginally more concise usingreplace(val, all(is.na(val)) * 1, 0)
instead of theifelse(...)
.
– Mikko Marttila
Jan 3 at 14:30
@MikkoMarttila Good suggestion. I usually try and avoidifelse
in general
– Sotos
Jan 3 at 14:35
add a comment |
Another idea using dplyr
,
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>%
na.omit()
which gives,
# A tibble: 5 x 2
# Groups: id [2]
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
Another idea using dplyr
,
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>%
na.omit()
which gives,
# A tibble: 5 x 2
# Groups: id [2]
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
answered Jan 3 at 13:34
SotosSotos
31k51741
31k51741
2
(+1) Seems like the most robust answer here. Would be marginally more concise usingreplace(val, all(is.na(val)) * 1, 0)
instead of theifelse(...)
.
– Mikko Marttila
Jan 3 at 14:30
@MikkoMarttila Good suggestion. I usually try and avoidifelse
in general
– Sotos
Jan 3 at 14:35
add a comment |
2
(+1) Seems like the most robust answer here. Would be marginally more concise usingreplace(val, all(is.na(val)) * 1, 0)
instead of theifelse(...)
.
– Mikko Marttila
Jan 3 at 14:30
@MikkoMarttila Good suggestion. I usually try and avoidifelse
in general
– Sotos
Jan 3 at 14:35
2
2
(+1) Seems like the most robust answer here. Would be marginally more concise using
replace(val, all(is.na(val)) * 1, 0)
instead of the ifelse(...)
.– Mikko Marttila
Jan 3 at 14:30
(+1) Seems like the most robust answer here. Would be marginally more concise using
replace(val, all(is.na(val)) * 1, 0)
instead of the ifelse(...)
.– Mikko Marttila
Jan 3 at 14:30
@MikkoMarttila Good suggestion. I usually try and avoid
ifelse
in general– Sotos
Jan 3 at 14:35
@MikkoMarttila Good suggestion. I usually try and avoid
ifelse
in general– Sotos
Jan 3 at 14:35
add a comment |
We may do
df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))
# A tibble: 5 x 2
# Groups: id [2]
# id val
# <fct> <dbl>
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
After grouping by id
, if everything in val
is NA
, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit
.
In a more readable format that would be
df1 %>% group_by(id) %>%
do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))
(Here I presume that you indeed want to get rid of all NA
values; otherwise there is no need for na.omit
.)
1
@markus, right, I had assumed that that's the goal. Thanks!
– Julius Vainora
Jan 3 at 13:19
It looks like op wants to retain the first row and replace theval
column of that row with0
where allval
isNA
for a group. Check my ans pls. Agree with @markus, it does seem tricky
– Vivek Kalyanarangan
Jan 3 at 13:27
1
@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.
– Julius Vainora
Jan 3 at 13:29
add a comment |
We may do
df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))
# A tibble: 5 x 2
# Groups: id [2]
# id val
# <fct> <dbl>
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
After grouping by id
, if everything in val
is NA
, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit
.
In a more readable format that would be
df1 %>% group_by(id) %>%
do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))
(Here I presume that you indeed want to get rid of all NA
values; otherwise there is no need for na.omit
.)
1
@markus, right, I had assumed that that's the goal. Thanks!
– Julius Vainora
Jan 3 at 13:19
It looks like op wants to retain the first row and replace theval
column of that row with0
where allval
isNA
for a group. Check my ans pls. Agree with @markus, it does seem tricky
– Vivek Kalyanarangan
Jan 3 at 13:27
1
@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.
– Julius Vainora
Jan 3 at 13:29
add a comment |
We may do
df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))
# A tibble: 5 x 2
# Groups: id [2]
# id val
# <fct> <dbl>
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
After grouping by id
, if everything in val
is NA
, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit
.
In a more readable format that would be
df1 %>% group_by(id) %>%
do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))
(Here I presume that you indeed want to get rid of all NA
values; otherwise there is no need for na.omit
.)
We may do
df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))
# A tibble: 5 x 2
# Groups: id [2]
# id val
# <fct> <dbl>
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
After grouping by id
, if everything in val
is NA
, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit
.
In a more readable format that would be
df1 %>% group_by(id) %>%
do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))
(Here I presume that you indeed want to get rid of all NA
values; otherwise there is no need for na.omit
.)
edited Jan 3 at 13:31
answered Jan 3 at 13:11
Julius VainoraJulius Vainora
38.3k76786
38.3k76786
1
@markus, right, I had assumed that that's the goal. Thanks!
– Julius Vainora
Jan 3 at 13:19
It looks like op wants to retain the first row and replace theval
column of that row with0
where allval
isNA
for a group. Check my ans pls. Agree with @markus, it does seem tricky
– Vivek Kalyanarangan
Jan 3 at 13:27
1
@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.
– Julius Vainora
Jan 3 at 13:29
add a comment |
1
@markus, right, I had assumed that that's the goal. Thanks!
– Julius Vainora
Jan 3 at 13:19
It looks like op wants to retain the first row and replace theval
column of that row with0
where allval
isNA
for a group. Check my ans pls. Agree with @markus, it does seem tricky
– Vivek Kalyanarangan
Jan 3 at 13:27
1
@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.
– Julius Vainora
Jan 3 at 13:29
1
1
@markus, right, I had assumed that that's the goal. Thanks!
– Julius Vainora
Jan 3 at 13:19
@markus, right, I had assumed that that's the goal. Thanks!
– Julius Vainora
Jan 3 at 13:19
It looks like op wants to retain the first row and replace the
val
column of that row with 0
where all val
is NA
for a group. Check my ans pls. Agree with @markus, it does seem tricky– Vivek Kalyanarangan
Jan 3 at 13:27
It looks like op wants to retain the first row and replace the
val
column of that row with 0
where all val
is NA
for a group. Check my ans pls. Agree with @markus, it does seem tricky– Vivek Kalyanarangan
Jan 3 at 13:27
1
1
@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.
– Julius Vainora
Jan 3 at 13:29
@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.
– Julius Vainora
Jan 3 at 13:29
add a comment |
df1[is.na(df1)] <- 0
df1[!(duplicated(df1$id) & df1$val == 0), ]
id val
1 a 0
5 b 1
6 b 2
7 b 2
8 b 3
5
Would this work forid
s that containNA
s and non-NA
s? Try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
– markus
Jan 3 at 13:05
I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]
– Robert Hickman
Jan 3 at 13:26
add a comment |
df1[is.na(df1)] <- 0
df1[!(duplicated(df1$id) & df1$val == 0), ]
id val
1 a 0
5 b 1
6 b 2
7 b 2
8 b 3
5
Would this work forid
s that containNA
s and non-NA
s? Try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
– markus
Jan 3 at 13:05
I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]
– Robert Hickman
Jan 3 at 13:26
add a comment |
df1[is.na(df1)] <- 0
df1[!(duplicated(df1$id) & df1$val == 0), ]
id val
1 a 0
5 b 1
6 b 2
7 b 2
8 b 3
df1[is.na(df1)] <- 0
df1[!(duplicated(df1$id) & df1$val == 0), ]
id val
1 a 0
5 b 1
6 b 2
7 b 2
8 b 3
answered Jan 3 at 13:02
AdammAdamm
961718
961718
5
Would this work forid
s that containNA
s and non-NA
s? Try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
– markus
Jan 3 at 13:05
I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]
– Robert Hickman
Jan 3 at 13:26
add a comment |
5
Would this work forid
s that containNA
s and non-NA
s? Try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
– markus
Jan 3 at 13:05
I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]
– Robert Hickman
Jan 3 at 13:26
5
5
Would this work for
id
s that contain NA
s and non-NA
s? Try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
– markus
Jan 3 at 13:05
Would this work for
id
s that contain NA
s and non-NA
s? Try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
– markus
Jan 3 at 13:05
I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]
– Robert Hickman
Jan 3 at 13:26
I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]
– Robert Hickman
Jan 3 at 13:26
add a comment |
Base R option is to find groups with all NA
s and transform
them by changing their val
to 0 and select only unique
rows so that there is only one row per group. We rbind
this dataframe with the groups which are !all_NA
.
all_NA <- with(df1, ave(is.na(val), id, FUN = all))
rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])
# id val
#1 a 0
#5 b 1
#6 b 2
#7 b 2
#8 b 3
dplyr
option looks ugly but one way is to make two groups of dataframes one with groups of all NA
values and other with groups of all non-NA values. For groups with all NA
values we add row with it's id
and val
as 0 and bind this to the other group.
library(dplyr)
bind_rows(df1 %>%
group_by(id) %>%
filter(all(!is.na(val))),
df1 %>%
group_by(id) %>%
filter(all(is.na(val))) %>%
ungroup() %>%
summarise(id = unique(id),
val = 0)) %>%
arrange(id)
# id val
# <fct> <dbl>
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
add a comment |
Base R option is to find groups with all NA
s and transform
them by changing their val
to 0 and select only unique
rows so that there is only one row per group. We rbind
this dataframe with the groups which are !all_NA
.
all_NA <- with(df1, ave(is.na(val), id, FUN = all))
rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])
# id val
#1 a 0
#5 b 1
#6 b 2
#7 b 2
#8 b 3
dplyr
option looks ugly but one way is to make two groups of dataframes one with groups of all NA
values and other with groups of all non-NA values. For groups with all NA
values we add row with it's id
and val
as 0 and bind this to the other group.
library(dplyr)
bind_rows(df1 %>%
group_by(id) %>%
filter(all(!is.na(val))),
df1 %>%
group_by(id) %>%
filter(all(is.na(val))) %>%
ungroup() %>%
summarise(id = unique(id),
val = 0)) %>%
arrange(id)
# id val
# <fct> <dbl>
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
add a comment |
Base R option is to find groups with all NA
s and transform
them by changing their val
to 0 and select only unique
rows so that there is only one row per group. We rbind
this dataframe with the groups which are !all_NA
.
all_NA <- with(df1, ave(is.na(val), id, FUN = all))
rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])
# id val
#1 a 0
#5 b 1
#6 b 2
#7 b 2
#8 b 3
dplyr
option looks ugly but one way is to make two groups of dataframes one with groups of all NA
values and other with groups of all non-NA values. For groups with all NA
values we add row with it's id
and val
as 0 and bind this to the other group.
library(dplyr)
bind_rows(df1 %>%
group_by(id) %>%
filter(all(!is.na(val))),
df1 %>%
group_by(id) %>%
filter(all(is.na(val))) %>%
ungroup() %>%
summarise(id = unique(id),
val = 0)) %>%
arrange(id)
# id val
# <fct> <dbl>
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
Base R option is to find groups with all NA
s and transform
them by changing their val
to 0 and select only unique
rows so that there is only one row per group. We rbind
this dataframe with the groups which are !all_NA
.
all_NA <- with(df1, ave(is.na(val), id, FUN = all))
rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])
# id val
#1 a 0
#5 b 1
#6 b 2
#7 b 2
#8 b 3
dplyr
option looks ugly but one way is to make two groups of dataframes one with groups of all NA
values and other with groups of all non-NA values. For groups with all NA
values we add row with it's id
and val
as 0 and bind this to the other group.
library(dplyr)
bind_rows(df1 %>%
group_by(id) %>%
filter(all(!is.na(val))),
df1 %>%
group_by(id) %>%
filter(all(is.na(val))) %>%
ungroup() %>%
summarise(id = unique(id),
val = 0)) %>%
arrange(id)
# id val
# <fct> <dbl>
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
edited Jan 3 at 13:17
answered Jan 3 at 12:56
Ronak ShahRonak Shah
43.5k104266
43.5k104266
add a comment |
add a comment |
Changed the df
to make example more exhaustive -
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%
mutate(val=ifelse(is.na(val)&case,0,val)) %>%
filter( !(case&row_num!=1) ) %>%
select(id, val)
Output
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
add a comment |
Changed the df
to make example more exhaustive -
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%
mutate(val=ifelse(is.na(val)&case,0,val)) %>%
filter( !(case&row_num!=1) ) %>%
select(id, val)
Output
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
add a comment |
Changed the df
to make example more exhaustive -
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%
mutate(val=ifelse(is.na(val)&case,0,val)) %>%
filter( !(case&row_num!=1) ) %>%
select(id, val)
Output
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
Changed the df
to make example more exhaustive -
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%
mutate(val=ifelse(is.na(val)&case,0,val)) %>%
filter( !(case&row_num!=1) ) %>%
select(id, val)
Output
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
answered Jan 3 at 13:25
Vivek KalyanaranganVivek Kalyanarangan
5,1141829
5,1141829
add a comment |
add a comment |
Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:
df1 <- na.omit(df1)
df1 <- rbind(
df1,
data.frame(
id = levels(df1$id)[!levels(df1$id) %in% df1$id],
val = 0)
)
I do personally prefer the dplyr approach given by Sotos, as I don't like rbind
-ing data.frames back together so it's a matter of taste, but this isn't unbearably complicated by my eye. It's easy enough to adapt to a character id
column with a unique(df1$id)
variable.
add a comment |
Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:
df1 <- na.omit(df1)
df1 <- rbind(
df1,
data.frame(
id = levels(df1$id)[!levels(df1$id) %in% df1$id],
val = 0)
)
I do personally prefer the dplyr approach given by Sotos, as I don't like rbind
-ing data.frames back together so it's a matter of taste, but this isn't unbearably complicated by my eye. It's easy enough to adapt to a character id
column with a unique(df1$id)
variable.
add a comment |
Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:
df1 <- na.omit(df1)
df1 <- rbind(
df1,
data.frame(
id = levels(df1$id)[!levels(df1$id) %in% df1$id],
val = 0)
)
I do personally prefer the dplyr approach given by Sotos, as I don't like rbind
-ing data.frames back together so it's a matter of taste, but this isn't unbearably complicated by my eye. It's easy enough to adapt to a character id
column with a unique(df1$id)
variable.
Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:
df1 <- na.omit(df1)
df1 <- rbind(
df1,
data.frame(
id = levels(df1$id)[!levels(df1$id) %in% df1$id],
val = 0)
)
I do personally prefer the dplyr approach given by Sotos, as I don't like rbind
-ing data.frames back together so it's a matter of taste, but this isn't unbearably complicated by my eye. It's easy enough to adapt to a character id
column with a unique(df1$id)
variable.
answered Jan 3 at 16:09
CriminallyVulgarCriminallyVulgar
16615
16615
add a comment |
add a comment |
Here is an option too:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
slice(4:nrow(.))
This gives:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
Alternative:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
unique()
UPDATE based on other requirements:
Some users suggested to test on this dataframe. Of course this answer assumes you'll look at everything by hand. Might be less useful if you have to look at everything by "hand" but here goes:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate(val=ifelse(id=="a",0,val)) %>%
slice(4:nrow(.))
This yields:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
3
where did 4 come from?
– Sotos
Jan 3 at 13:22
The solution produces four 0s. We're only interested in having 1?
– NelsonGon
Jan 3 at 13:23
What if one group has 4 and another 3?
– Sotos
Jan 3 at 13:26
Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!
– NelsonGon
Jan 3 at 13:27
Consider this example -df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
I think here OP wants to removeNA
values forA
group only, not the rest
– Vivek Kalyanarangan
Jan 3 at 13:28
|
show 2 more comments
Here is an option too:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
slice(4:nrow(.))
This gives:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
Alternative:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
unique()
UPDATE based on other requirements:
Some users suggested to test on this dataframe. Of course this answer assumes you'll look at everything by hand. Might be less useful if you have to look at everything by "hand" but here goes:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate(val=ifelse(id=="a",0,val)) %>%
slice(4:nrow(.))
This yields:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
3
where did 4 come from?
– Sotos
Jan 3 at 13:22
The solution produces four 0s. We're only interested in having 1?
– NelsonGon
Jan 3 at 13:23
What if one group has 4 and another 3?
– Sotos
Jan 3 at 13:26
Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!
– NelsonGon
Jan 3 at 13:27
Consider this example -df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
I think here OP wants to removeNA
values forA
group only, not the rest
– Vivek Kalyanarangan
Jan 3 at 13:28
|
show 2 more comments
Here is an option too:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
slice(4:nrow(.))
This gives:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
Alternative:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
unique()
UPDATE based on other requirements:
Some users suggested to test on this dataframe. Of course this answer assumes you'll look at everything by hand. Might be less useful if you have to look at everything by "hand" but here goes:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate(val=ifelse(id=="a",0,val)) %>%
slice(4:nrow(.))
This yields:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
Here is an option too:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
slice(4:nrow(.))
This gives:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
Alternative:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
unique()
UPDATE based on other requirements:
Some users suggested to test on this dataframe. Of course this answer assumes you'll look at everything by hand. Might be less useful if you have to look at everything by "hand" but here goes:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate(val=ifelse(id=="a",0,val)) %>%
slice(4:nrow(.))
This yields:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
edited Jan 4 at 4:49
answered Jan 3 at 13:18
NelsonGonNelsonGon
3,4273833
3,4273833
3
where did 4 come from?
– Sotos
Jan 3 at 13:22
The solution produces four 0s. We're only interested in having 1?
– NelsonGon
Jan 3 at 13:23
What if one group has 4 and another 3?
– Sotos
Jan 3 at 13:26
Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!
– NelsonGon
Jan 3 at 13:27
Consider this example -df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
I think here OP wants to removeNA
values forA
group only, not the rest
– Vivek Kalyanarangan
Jan 3 at 13:28
|
show 2 more comments
3
where did 4 come from?
– Sotos
Jan 3 at 13:22
The solution produces four 0s. We're only interested in having 1?
– NelsonGon
Jan 3 at 13:23
What if one group has 4 and another 3?
– Sotos
Jan 3 at 13:26
Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!
– NelsonGon
Jan 3 at 13:27
Consider this example -df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
I think here OP wants to removeNA
values forA
group only, not the rest
– Vivek Kalyanarangan
Jan 3 at 13:28
3
3
where did 4 come from?
– Sotos
Jan 3 at 13:22
where did 4 come from?
– Sotos
Jan 3 at 13:22
The solution produces four 0s. We're only interested in having 1?
– NelsonGon
Jan 3 at 13:23
The solution produces four 0s. We're only interested in having 1?
– NelsonGon
Jan 3 at 13:23
What if one group has 4 and another 3?
– Sotos
Jan 3 at 13:26
What if one group has 4 and another 3?
– Sotos
Jan 3 at 13:26
Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!
– NelsonGon
Jan 3 at 13:27
Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!
– NelsonGon
Jan 3 at 13:27
Consider this example -
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
I think here OP wants to remove NA
values for A
group only, not the rest– Vivek Kalyanarangan
Jan 3 at 13:28
Consider this example -
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
I think here OP wants to remove NA
values for A
group only, not the rest– Vivek Kalyanarangan
Jan 3 at 13:28
|
show 2 more comments
Here is a base R solution.
res <- lapply(split(df1, df1$id), function(DF){
if(anyNA(DF$val)) {
i <- is.na(DF$val)
DF$val[i] <- 0
DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])
}
DF
})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
# id val
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
Edit.
A dplyr
solution could be the following.
It was tested with the original dataset posted by the OP, with the dataset in Vivek Kalyanarangan's answer and with the dataset in markus' comment, renamed df2
and df3
, respectively.
library(dplyr)
na2zero <- function(DF){
DF %>%
group_by(id) %>%
mutate(val = ifelse(is.na(val), 0, val),
crit = val == 0 & duplicated(val)) %>%
filter(!crit) %>%
select(-crit)
}
na2zero(df1)
na2zero(df2)
na2zero(df3)
Rui, try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
. Unfortunately your solution doesn't return a data frame with only three rows.
– markus
Jan 3 at 13:21
@markus No, it doesn't. TheNA
is replaced by a0
and the other value ofval
is notNA
so both must be in the output. At least that's how I'm understanding the OP's problem.
– Rui Barradas
Jan 3 at 14:05
Fair enough. People are reading the question differently.
– markus
Jan 3 at 14:51
add a comment |
Here is a base R solution.
res <- lapply(split(df1, df1$id), function(DF){
if(anyNA(DF$val)) {
i <- is.na(DF$val)
DF$val[i] <- 0
DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])
}
DF
})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
# id val
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
Edit.
A dplyr
solution could be the following.
It was tested with the original dataset posted by the OP, with the dataset in Vivek Kalyanarangan's answer and with the dataset in markus' comment, renamed df2
and df3
, respectively.
library(dplyr)
na2zero <- function(DF){
DF %>%
group_by(id) %>%
mutate(val = ifelse(is.na(val), 0, val),
crit = val == 0 & duplicated(val)) %>%
filter(!crit) %>%
select(-crit)
}
na2zero(df1)
na2zero(df2)
na2zero(df3)
Rui, try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
. Unfortunately your solution doesn't return a data frame with only three rows.
– markus
Jan 3 at 13:21
@markus No, it doesn't. TheNA
is replaced by a0
and the other value ofval
is notNA
so both must be in the output. At least that's how I'm understanding the OP's problem.
– Rui Barradas
Jan 3 at 14:05
Fair enough. People are reading the question differently.
– markus
Jan 3 at 14:51
add a comment |
Here is a base R solution.
res <- lapply(split(df1, df1$id), function(DF){
if(anyNA(DF$val)) {
i <- is.na(DF$val)
DF$val[i] <- 0
DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])
}
DF
})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
# id val
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
Edit.
A dplyr
solution could be the following.
It was tested with the original dataset posted by the OP, with the dataset in Vivek Kalyanarangan's answer and with the dataset in markus' comment, renamed df2
and df3
, respectively.
library(dplyr)
na2zero <- function(DF){
DF %>%
group_by(id) %>%
mutate(val = ifelse(is.na(val), 0, val),
crit = val == 0 & duplicated(val)) %>%
filter(!crit) %>%
select(-crit)
}
na2zero(df1)
na2zero(df2)
na2zero(df3)
Here is a base R solution.
res <- lapply(split(df1, df1$id), function(DF){
if(anyNA(DF$val)) {
i <- is.na(DF$val)
DF$val[i] <- 0
DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])
}
DF
})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
# id val
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
Edit.
A dplyr
solution could be the following.
It was tested with the original dataset posted by the OP, with the dataset in Vivek Kalyanarangan's answer and with the dataset in markus' comment, renamed df2
and df3
, respectively.
library(dplyr)
na2zero <- function(DF){
DF %>%
group_by(id) %>%
mutate(val = ifelse(is.na(val), 0, val),
crit = val == 0 & duplicated(val)) %>%
filter(!crit) %>%
select(-crit)
}
na2zero(df1)
na2zero(df2)
na2zero(df3)
edited Jan 3 at 14:22
answered Jan 3 at 13:03
Rui BarradasRui Barradas
18.2k51833
18.2k51833
Rui, try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
. Unfortunately your solution doesn't return a data frame with only three rows.
– markus
Jan 3 at 13:21
@markus No, it doesn't. TheNA
is replaced by a0
and the other value ofval
is notNA
so both must be in the output. At least that's how I'm understanding the OP's problem.
– Rui Barradas
Jan 3 at 14:05
Fair enough. People are reading the question differently.
– markus
Jan 3 at 14:51
add a comment |
Rui, try withdf1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
. Unfortunately your solution doesn't return a data frame with only three rows.
– markus
Jan 3 at 13:21
@markus No, it doesn't. TheNA
is replaced by a0
and the other value ofval
is notNA
so both must be in the output. At least that's how I'm understanding the OP's problem.
– Rui Barradas
Jan 3 at 14:05
Fair enough. People are reading the question differently.
– markus
Jan 3 at 14:51
Rui, try with
df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
. Unfortunately your solution doesn't return a data frame with only three rows.– markus
Jan 3 at 13:21
Rui, try with
df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))
. Unfortunately your solution doesn't return a data frame with only three rows.– markus
Jan 3 at 13:21
@markus No, it doesn't. The
NA
is replaced by a 0
and the other value of val
is not NA
so both must be in the output. At least that's how I'm understanding the OP's problem.– Rui Barradas
Jan 3 at 14:05
@markus No, it doesn't. The
NA
is replaced by a 0
and the other value of val
is not NA
so both must be in the output. At least that's how I'm understanding the OP's problem.– Rui Barradas
Jan 3 at 14:05
Fair enough. People are reading the question differently.
– markus
Jan 3 at 14:51
Fair enough. People are reading the question differently.
– markus
Jan 3 at 14:51
add a comment |
One may try this :
df1 = data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
# id val
#1 a NA
#2 a NA
#3 a NA
#4 a NA
#5 b 1
#6 b 2
#7 b 2
#8 b 3
#9 c NA
#10 c 2
#11 c NA
#12 c 3
Task is to remove all rows corresponding to any id
IFF val
for the corresponding id
is all NA
s and add new row with this id
and val = 0
.
In this example, id = a
.
Note : val
for c
also has NA
s but all the val
corresponding to c
are not NA
therefore we need to remove the corresponding row for c
where val = NA
.
So lets create another column say, val2
which indicates 0
means its all NA
s and 1 otherwise.
library(dplyr)
df1 = df1 %>%
group_by(id) %>%
mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false = 1))
df1
# A tibble: 12 x 3
# Groups: id [3]
# id val val2
# <fct> <dbl> <dbl>
#1 a NA 0
#2 a NA 0
#3 a NA 0
#4 a NA 0
#5 b 1 1
#6 b 2 1
#7 b 2 1
#8 b 3 1
#9 c NA 1
#10 c 2 1
#11 c NA 1
#12 c 3 1
Get the list of id
s with corresponding val = NA
for all.
all_na = unique(df1$id[df1$val2 == 0])
Then remove theid
s from the dataframe df1
with val = NA
.
df1 = na.omit(df1)
df1
# A tibble: 6 x 3
# Groups: id [2]
# id val val2
# <fct> <dbl> <dbl>
# 1 b 1 1
# 2 b 2 1
# 3 b 2 1
# 4 b 3 1
# 5 c 2 1
# 6 c 3 1
And create a new dataframe with id
s in all_na
and val = 0
all_na_df = data.frame(id = all_na, val = 0)
all_na_df
# id val
# 1 a 0
then combine these two dataframes.
df1 = bind_rows(all_na_df, df1[,c('id', 'val')])
df1
# id val
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
# 6 c 2
# 7 c 3
Hope this helps and Edits are most welcomed :-)
add a comment |
One may try this :
df1 = data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
# id val
#1 a NA
#2 a NA
#3 a NA
#4 a NA
#5 b 1
#6 b 2
#7 b 2
#8 b 3
#9 c NA
#10 c 2
#11 c NA
#12 c 3
Task is to remove all rows corresponding to any id
IFF val
for the corresponding id
is all NA
s and add new row with this id
and val = 0
.
In this example, id = a
.
Note : val
for c
also has NA
s but all the val
corresponding to c
are not NA
therefore we need to remove the corresponding row for c
where val = NA
.
So lets create another column say, val2
which indicates 0
means its all NA
s and 1 otherwise.
library(dplyr)
df1 = df1 %>%
group_by(id) %>%
mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false = 1))
df1
# A tibble: 12 x 3
# Groups: id [3]
# id val val2
# <fct> <dbl> <dbl>
#1 a NA 0
#2 a NA 0
#3 a NA 0
#4 a NA 0
#5 b 1 1
#6 b 2 1
#7 b 2 1
#8 b 3 1
#9 c NA 1
#10 c 2 1
#11 c NA 1
#12 c 3 1
Get the list of id
s with corresponding val = NA
for all.
all_na = unique(df1$id[df1$val2 == 0])
Then remove theid
s from the dataframe df1
with val = NA
.
df1 = na.omit(df1)
df1
# A tibble: 6 x 3
# Groups: id [2]
# id val val2
# <fct> <dbl> <dbl>
# 1 b 1 1
# 2 b 2 1
# 3 b 2 1
# 4 b 3 1
# 5 c 2 1
# 6 c 3 1
And create a new dataframe with id
s in all_na
and val = 0
all_na_df = data.frame(id = all_na, val = 0)
all_na_df
# id val
# 1 a 0
then combine these two dataframes.
df1 = bind_rows(all_na_df, df1[,c('id', 'val')])
df1
# id val
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
# 6 c 2
# 7 c 3
Hope this helps and Edits are most welcomed :-)
add a comment |
One may try this :
df1 = data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
# id val
#1 a NA
#2 a NA
#3 a NA
#4 a NA
#5 b 1
#6 b 2
#7 b 2
#8 b 3
#9 c NA
#10 c 2
#11 c NA
#12 c 3
Task is to remove all rows corresponding to any id
IFF val
for the corresponding id
is all NA
s and add new row with this id
and val = 0
.
In this example, id = a
.
Note : val
for c
also has NA
s but all the val
corresponding to c
are not NA
therefore we need to remove the corresponding row for c
where val = NA
.
So lets create another column say, val2
which indicates 0
means its all NA
s and 1 otherwise.
library(dplyr)
df1 = df1 %>%
group_by(id) %>%
mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false = 1))
df1
# A tibble: 12 x 3
# Groups: id [3]
# id val val2
# <fct> <dbl> <dbl>
#1 a NA 0
#2 a NA 0
#3 a NA 0
#4 a NA 0
#5 b 1 1
#6 b 2 1
#7 b 2 1
#8 b 3 1
#9 c NA 1
#10 c 2 1
#11 c NA 1
#12 c 3 1
Get the list of id
s with corresponding val = NA
for all.
all_na = unique(df1$id[df1$val2 == 0])
Then remove theid
s from the dataframe df1
with val = NA
.
df1 = na.omit(df1)
df1
# A tibble: 6 x 3
# Groups: id [2]
# id val val2
# <fct> <dbl> <dbl>
# 1 b 1 1
# 2 b 2 1
# 3 b 2 1
# 4 b 3 1
# 5 c 2 1
# 6 c 3 1
And create a new dataframe with id
s in all_na
and val = 0
all_na_df = data.frame(id = all_na, val = 0)
all_na_df
# id val
# 1 a 0
then combine these two dataframes.
df1 = bind_rows(all_na_df, df1[,c('id', 'val')])
df1
# id val
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
# 6 c 2
# 7 c 3
Hope this helps and Edits are most welcomed :-)
One may try this :
df1 = data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
# id val
#1 a NA
#2 a NA
#3 a NA
#4 a NA
#5 b 1
#6 b 2
#7 b 2
#8 b 3
#9 c NA
#10 c 2
#11 c NA
#12 c 3
Task is to remove all rows corresponding to any id
IFF val
for the corresponding id
is all NA
s and add new row with this id
and val = 0
.
In this example, id = a
.
Note : val
for c
also has NA
s but all the val
corresponding to c
are not NA
therefore we need to remove the corresponding row for c
where val = NA
.
So lets create another column say, val2
which indicates 0
means its all NA
s and 1 otherwise.
library(dplyr)
df1 = df1 %>%
group_by(id) %>%
mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false = 1))
df1
# A tibble: 12 x 3
# Groups: id [3]
# id val val2
# <fct> <dbl> <dbl>
#1 a NA 0
#2 a NA 0
#3 a NA 0
#4 a NA 0
#5 b 1 1
#6 b 2 1
#7 b 2 1
#8 b 3 1
#9 c NA 1
#10 c 2 1
#11 c NA 1
#12 c 3 1
Get the list of id
s with corresponding val = NA
for all.
all_na = unique(df1$id[df1$val2 == 0])
Then remove theid
s from the dataframe df1
with val = NA
.
df1 = na.omit(df1)
df1
# A tibble: 6 x 3
# Groups: id [2]
# id val val2
# <fct> <dbl> <dbl>
# 1 b 1 1
# 2 b 2 1
# 3 b 2 1
# 4 b 3 1
# 5 c 2 1
# 6 c 3 1
And create a new dataframe with id
s in all_na
and val = 0
all_na_df = data.frame(id = all_na, val = 0)
all_na_df
# id val
# 1 a 0
then combine these two dataframes.
df1 = bind_rows(all_na_df, df1[,c('id', 'val')])
df1
# id val
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
# 6 c 2
# 7 c 3
Hope this helps and Edits are most welcomed :-)
edited Jan 8 at 12:37
answered Jan 8 at 10:48
heisenbug47heisenbug47
7911
7911
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54022536%2freplace-all-na-values-for-variable-with-one-row-equal-to-0%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
So you want to add rows with 0 only if all the values for particular
id
is 0?– Ronak Shah
Jan 3 at 12:47
only if they're all NA for a particular id
– Robert Hickman
Jan 3 at 12:49
1
@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on this
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
? Thanks to @VivekKalyanarangan for the data.– markus
Jan 3 at 13:31