Removing duplicates in R based on condition
I need to embed a condition in a remove duplicates function. I am working with large student database from South Africa, a highly multilingual country. Last week you guys gave me the code to remove duplicates caused by retakes, but I now realise my language exam data shows some students offering more than 2 different languages.
The source data, simplified looks like this
STUDID MATSUBJ SCORE
101 AFRIKAANSB 1
101 AFRIKAANSB 4
102 ENGLISHB 2
102 ISIZULUB 7
102 ENGLISHB 5
The result file I need is
STUDID MATSUBJ SCORE flagextra
101 AFRIKAANS 4
102 ENGLISH 5
102 ISIZULUB 7 1
I need to flag the extra language so that I can see what languages they are and make new category for this
r if-statement duplicates
add a comment |
I need to embed a condition in a remove duplicates function. I am working with large student database from South Africa, a highly multilingual country. Last week you guys gave me the code to remove duplicates caused by retakes, but I now realise my language exam data shows some students offering more than 2 different languages.
The source data, simplified looks like this
STUDID MATSUBJ SCORE
101 AFRIKAANSB 1
101 AFRIKAANSB 4
102 ENGLISHB 2
102 ISIZULUB 7
102 ENGLISHB 5
The result file I need is
STUDID MATSUBJ SCORE flagextra
101 AFRIKAANS 4
102 ENGLISH 5
102 ISIZULUB 7 1
I need to flag the extra language so that I can see what languages they are and make new category for this
r if-statement duplicates
so extra language is the one which occurs just one time ?
– YOLO
Dec 31 '18 at 11:20
1
Can you show some effort solving this problem? This is very similar to your previous question which has answers.
– PoGibas
Dec 31 '18 at 11:25
@PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function
– CharlotteM
Dec 31 '18 at 14:46
add a comment |
I need to embed a condition in a remove duplicates function. I am working with large student database from South Africa, a highly multilingual country. Last week you guys gave me the code to remove duplicates caused by retakes, but I now realise my language exam data shows some students offering more than 2 different languages.
The source data, simplified looks like this
STUDID MATSUBJ SCORE
101 AFRIKAANSB 1
101 AFRIKAANSB 4
102 ENGLISHB 2
102 ISIZULUB 7
102 ENGLISHB 5
The result file I need is
STUDID MATSUBJ SCORE flagextra
101 AFRIKAANS 4
102 ENGLISH 5
102 ISIZULUB 7 1
I need to flag the extra language so that I can see what languages they are and make new category for this
r if-statement duplicates
I need to embed a condition in a remove duplicates function. I am working with large student database from South Africa, a highly multilingual country. Last week you guys gave me the code to remove duplicates caused by retakes, but I now realise my language exam data shows some students offering more than 2 different languages.
The source data, simplified looks like this
STUDID MATSUBJ SCORE
101 AFRIKAANSB 1
101 AFRIKAANSB 4
102 ENGLISHB 2
102 ISIZULUB 7
102 ENGLISHB 5
The result file I need is
STUDID MATSUBJ SCORE flagextra
101 AFRIKAANS 4
102 ENGLISH 5
102 ISIZULUB 7 1
I need to flag the extra language so that I can see what languages they are and make new category for this
r if-statement duplicates
r if-statement duplicates
edited Dec 31 '18 at 11:19
YOLO
5,4721424
5,4721424
asked Dec 31 '18 at 11:15
CharlotteMCharlotteM
101
101
so extra language is the one which occurs just one time ?
– YOLO
Dec 31 '18 at 11:20
1
Can you show some effort solving this problem? This is very similar to your previous question which has answers.
– PoGibas
Dec 31 '18 at 11:25
@PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function
– CharlotteM
Dec 31 '18 at 14:46
add a comment |
so extra language is the one which occurs just one time ?
– YOLO
Dec 31 '18 at 11:20
1
Can you show some effort solving this problem? This is very similar to your previous question which has answers.
– PoGibas
Dec 31 '18 at 11:25
@PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function
– CharlotteM
Dec 31 '18 at 14:46
so extra language is the one which occurs just one time ?
– YOLO
Dec 31 '18 at 11:20
so extra language is the one which occurs just one time ?
– YOLO
Dec 31 '18 at 11:20
1
1
Can you show some effort solving this problem? This is very similar to your previous question which has answers.
– PoGibas
Dec 31 '18 at 11:25
Can you show some effort solving this problem? This is very similar to your previous question which has answers.
– PoGibas
Dec 31 '18 at 11:25
@PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function
– CharlotteM
Dec 31 '18 at 14:46
@PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function
– CharlotteM
Dec 31 '18 at 14:46
add a comment |
2 Answers
2
active
oldest
votes
May be this helps
library(tidyverse)
df1 %>%
group_by(STUDID, MATSUBJ) %>%
summarise(SCORE = max(SCORE),
flagextra = as.integer(!sum(duplicated(MATSUBJ))))
# A tibble: 3 x 4
# Groups: STUDID [?]
# STUDID MATSUBJ SCORE flagextra
# <int> <chr> <dbl> <int>
#1 101 AFRIKAANSB 4 0
#2 102 ENGLISHB 5 0
#3 102 ISIZULUB 7 1
Or with base R
i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
transform(aggregate(SCORE ~ ., df1, max),
flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))
data
df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ
= c("AFRIKAANSB",
"AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
))
Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)
– CharlotteM
Dec 31 '18 at 14:31
@CharlotteM Without the error messages, it is not clear what the issue
– akrun
Dec 31 '18 at 14:32
LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.
– CharlotteM
Dec 31 '18 at 14:53
i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"
– CharlotteM
Dec 31 '18 at 14:54
@CharlotteM The error is pretty much clear. You have afactor
column. Based on the input showed, I assume it asnumeric
. You may need to convert it tonumeric
first i.e.LANG2$L2score <- as.numeric(as.character(LANG2$L2score))
– akrun
Dec 31 '18 at 14:56
|
show 1 more comment
Two stage procedure works better for me as a newbie to R:
remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53986797%2fremoving-duplicates-in-r-based-on-condition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
May be this helps
library(tidyverse)
df1 %>%
group_by(STUDID, MATSUBJ) %>%
summarise(SCORE = max(SCORE),
flagextra = as.integer(!sum(duplicated(MATSUBJ))))
# A tibble: 3 x 4
# Groups: STUDID [?]
# STUDID MATSUBJ SCORE flagextra
# <int> <chr> <dbl> <int>
#1 101 AFRIKAANSB 4 0
#2 102 ENGLISHB 5 0
#3 102 ISIZULUB 7 1
Or with base R
i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
transform(aggregate(SCORE ~ ., df1, max),
flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))
data
df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ
= c("AFRIKAANSB",
"AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
))
Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)
– CharlotteM
Dec 31 '18 at 14:31
@CharlotteM Without the error messages, it is not clear what the issue
– akrun
Dec 31 '18 at 14:32
LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.
– CharlotteM
Dec 31 '18 at 14:53
i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"
– CharlotteM
Dec 31 '18 at 14:54
@CharlotteM The error is pretty much clear. You have afactor
column. Based on the input showed, I assume it asnumeric
. You may need to convert it tonumeric
first i.e.LANG2$L2score <- as.numeric(as.character(LANG2$L2score))
– akrun
Dec 31 '18 at 14:56
|
show 1 more comment
May be this helps
library(tidyverse)
df1 %>%
group_by(STUDID, MATSUBJ) %>%
summarise(SCORE = max(SCORE),
flagextra = as.integer(!sum(duplicated(MATSUBJ))))
# A tibble: 3 x 4
# Groups: STUDID [?]
# STUDID MATSUBJ SCORE flagextra
# <int> <chr> <dbl> <int>
#1 101 AFRIKAANSB 4 0
#2 102 ENGLISHB 5 0
#3 102 ISIZULUB 7 1
Or with base R
i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
transform(aggregate(SCORE ~ ., df1, max),
flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))
data
df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ
= c("AFRIKAANSB",
"AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
))
Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)
– CharlotteM
Dec 31 '18 at 14:31
@CharlotteM Without the error messages, it is not clear what the issue
– akrun
Dec 31 '18 at 14:32
LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.
– CharlotteM
Dec 31 '18 at 14:53
i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"
– CharlotteM
Dec 31 '18 at 14:54
@CharlotteM The error is pretty much clear. You have afactor
column. Based on the input showed, I assume it asnumeric
. You may need to convert it tonumeric
first i.e.LANG2$L2score <- as.numeric(as.character(LANG2$L2score))
– akrun
Dec 31 '18 at 14:56
|
show 1 more comment
May be this helps
library(tidyverse)
df1 %>%
group_by(STUDID, MATSUBJ) %>%
summarise(SCORE = max(SCORE),
flagextra = as.integer(!sum(duplicated(MATSUBJ))))
# A tibble: 3 x 4
# Groups: STUDID [?]
# STUDID MATSUBJ SCORE flagextra
# <int> <chr> <dbl> <int>
#1 101 AFRIKAANSB 4 0
#2 102 ENGLISHB 5 0
#3 102 ISIZULUB 7 1
Or with base R
i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
transform(aggregate(SCORE ~ ., df1, max),
flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))
data
df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ
= c("AFRIKAANSB",
"AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
))
May be this helps
library(tidyverse)
df1 %>%
group_by(STUDID, MATSUBJ) %>%
summarise(SCORE = max(SCORE),
flagextra = as.integer(!sum(duplicated(MATSUBJ))))
# A tibble: 3 x 4
# Groups: STUDID [?]
# STUDID MATSUBJ SCORE flagextra
# <int> <chr> <dbl> <int>
#1 101 AFRIKAANSB 4 0
#2 102 ENGLISHB 5 0
#3 102 ISIZULUB 7 1
Or with base R
i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
transform(aggregate(SCORE ~ ., df1, max),
flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))
data
df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ
= c("AFRIKAANSB",
"AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
))
edited Dec 31 '18 at 11:40
answered Dec 31 '18 at 11:24
akrunakrun
406k13197272
406k13197272
Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)
– CharlotteM
Dec 31 '18 at 14:31
@CharlotteM Without the error messages, it is not clear what the issue
– akrun
Dec 31 '18 at 14:32
LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.
– CharlotteM
Dec 31 '18 at 14:53
i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"
– CharlotteM
Dec 31 '18 at 14:54
@CharlotteM The error is pretty much clear. You have afactor
column. Based on the input showed, I assume it asnumeric
. You may need to convert it tonumeric
first i.e.LANG2$L2score <- as.numeric(as.character(LANG2$L2score))
– akrun
Dec 31 '18 at 14:56
|
show 1 more comment
Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)
– CharlotteM
Dec 31 '18 at 14:31
@CharlotteM Without the error messages, it is not clear what the issue
– akrun
Dec 31 '18 at 14:32
LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.
– CharlotteM
Dec 31 '18 at 14:53
i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"
– CharlotteM
Dec 31 '18 at 14:54
@CharlotteM The error is pretty much clear. You have afactor
column. Based on the input showed, I assume it asnumeric
. You may need to convert it tonumeric
first i.e.LANG2$L2score <- as.numeric(as.character(LANG2$L2score))
– akrun
Dec 31 '18 at 14:56
Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)
– CharlotteM
Dec 31 '18 at 14:31
Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)
– CharlotteM
Dec 31 '18 at 14:31
@CharlotteM Without the error messages, it is not clear what the issue
– akrun
Dec 31 '18 at 14:32
@CharlotteM Without the error messages, it is not clear what the issue
– akrun
Dec 31 '18 at 14:32
LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.
– CharlotteM
Dec 31 '18 at 14:53
LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.
– CharlotteM
Dec 31 '18 at 14:53
i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"
– CharlotteM
Dec 31 '18 at 14:54
i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"
– CharlotteM
Dec 31 '18 at 14:54
@CharlotteM The error is pretty much clear. You have a
factor
column. Based on the input showed, I assume it as numeric
. You may need to convert it to numeric
first i.e. LANG2$L2score <- as.numeric(as.character(LANG2$L2score))
– akrun
Dec 31 '18 at 14:56
@CharlotteM The error is pretty much clear. You have a
factor
column. Based on the input showed, I assume it as numeric
. You may need to convert it to numeric
first i.e. LANG2$L2score <- as.numeric(as.character(LANG2$L2score))
– akrun
Dec 31 '18 at 14:56
|
show 1 more comment
Two stage procedure works better for me as a newbie to R:
remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)
add a comment |
Two stage procedure works better for me as a newbie to R:
remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)
add a comment |
Two stage procedure works better for me as a newbie to R:
remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)
Two stage procedure works better for me as a newbie to R:
remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)
answered Jan 6 at 13:31
CharlotteMCharlotteM
101
101
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53986797%2fremoving-duplicates-in-r-based-on-condition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
so extra language is the one which occurs just one time ?
– YOLO
Dec 31 '18 at 11:20
1
Can you show some effort solving this problem? This is very similar to your previous question which has answers.
– PoGibas
Dec 31 '18 at 11:25
@PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function
– CharlotteM
Dec 31 '18 at 14:46