Removing duplicates in R based on condition












-2















I need to embed a condition in a remove duplicates function. I am working with large student database from South Africa, a highly multilingual country. Last week you guys gave me the code to remove duplicates caused by retakes, but I now realise my language exam data shows some students offering more than 2 different languages.
The source data, simplified looks like this



STUDID   MATSUBJ     SCORE
101 AFRIKAANSB 1
101 AFRIKAANSB 4
102 ENGLISHB 2
102 ISIZULUB 7
102 ENGLISHB 5


The result file I need is



STUDID   MATSUBJ    SCORE  flagextra
101 AFRIKAANS 4
102 ENGLISH 5
102 ISIZULUB 7 1


I need to flag the extra language so that I can see what languages they are and make new category for this










share|improve this question

























  • so extra language is the one which occurs just one time ?

    – YOLO
    Dec 31 '18 at 11:20






  • 1





    Can you show some effort solving this problem? This is very similar to your previous question which has answers.

    – PoGibas
    Dec 31 '18 at 11:25













  • @PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function

    – CharlotteM
    Dec 31 '18 at 14:46
















-2















I need to embed a condition in a remove duplicates function. I am working with large student database from South Africa, a highly multilingual country. Last week you guys gave me the code to remove duplicates caused by retakes, but I now realise my language exam data shows some students offering more than 2 different languages.
The source data, simplified looks like this



STUDID   MATSUBJ     SCORE
101 AFRIKAANSB 1
101 AFRIKAANSB 4
102 ENGLISHB 2
102 ISIZULUB 7
102 ENGLISHB 5


The result file I need is



STUDID   MATSUBJ    SCORE  flagextra
101 AFRIKAANS 4
102 ENGLISH 5
102 ISIZULUB 7 1


I need to flag the extra language so that I can see what languages they are and make new category for this










share|improve this question

























  • so extra language is the one which occurs just one time ?

    – YOLO
    Dec 31 '18 at 11:20






  • 1





    Can you show some effort solving this problem? This is very similar to your previous question which has answers.

    – PoGibas
    Dec 31 '18 at 11:25













  • @PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function

    – CharlotteM
    Dec 31 '18 at 14:46














-2












-2








-2








I need to embed a condition in a remove duplicates function. I am working with large student database from South Africa, a highly multilingual country. Last week you guys gave me the code to remove duplicates caused by retakes, but I now realise my language exam data shows some students offering more than 2 different languages.
The source data, simplified looks like this



STUDID   MATSUBJ     SCORE
101 AFRIKAANSB 1
101 AFRIKAANSB 4
102 ENGLISHB 2
102 ISIZULUB 7
102 ENGLISHB 5


The result file I need is



STUDID   MATSUBJ    SCORE  flagextra
101 AFRIKAANS 4
102 ENGLISH 5
102 ISIZULUB 7 1


I need to flag the extra language so that I can see what languages they are and make new category for this










share|improve this question
















I need to embed a condition in a remove duplicates function. I am working with large student database from South Africa, a highly multilingual country. Last week you guys gave me the code to remove duplicates caused by retakes, but I now realise my language exam data shows some students offering more than 2 different languages.
The source data, simplified looks like this



STUDID   MATSUBJ     SCORE
101 AFRIKAANSB 1
101 AFRIKAANSB 4
102 ENGLISHB 2
102 ISIZULUB 7
102 ENGLISHB 5


The result file I need is



STUDID   MATSUBJ    SCORE  flagextra
101 AFRIKAANS 4
102 ENGLISH 5
102 ISIZULUB 7 1


I need to flag the extra language so that I can see what languages they are and make new category for this







r if-statement duplicates






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 31 '18 at 11:19









YOLO

5,4721424




5,4721424










asked Dec 31 '18 at 11:15









CharlotteMCharlotteM

101




101













  • so extra language is the one which occurs just one time ?

    – YOLO
    Dec 31 '18 at 11:20






  • 1





    Can you show some effort solving this problem? This is very similar to your previous question which has answers.

    – PoGibas
    Dec 31 '18 at 11:25













  • @PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function

    – CharlotteM
    Dec 31 '18 at 14:46



















  • so extra language is the one which occurs just one time ?

    – YOLO
    Dec 31 '18 at 11:20






  • 1





    Can you show some effort solving this problem? This is very similar to your previous question which has answers.

    – PoGibas
    Dec 31 '18 at 11:25













  • @PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function

    – CharlotteM
    Dec 31 '18 at 14:46

















so extra language is the one which occurs just one time ?

– YOLO
Dec 31 '18 at 11:20





so extra language is the one which occurs just one time ?

– YOLO
Dec 31 '18 at 11:20




1




1





Can you show some effort solving this problem? This is very similar to your previous question which has answers.

– PoGibas
Dec 31 '18 at 11:25







Can you show some effort solving this problem? This is very similar to your previous question which has answers.

– PoGibas
Dec 31 '18 at 11:25















@PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function

– CharlotteM
Dec 31 '18 at 14:46





@PoGibas This my second question adds the complication of a condition to the earlier one about duplication. I have been using the answers to my first question, but hit a problem with the real data which requires this extra condition function

– CharlotteM
Dec 31 '18 at 14:46












2 Answers
2






active

oldest

votes


















0














May be this helps



library(tidyverse)
df1 %>%
group_by(STUDID, MATSUBJ) %>%
summarise(SCORE = max(SCORE),
flagextra = as.integer(!sum(duplicated(MATSUBJ))))
# A tibble: 3 x 4
# Groups: STUDID [?]
# STUDID MATSUBJ SCORE flagextra
# <int> <chr> <dbl> <int>
#1 101 AFRIKAANSB 4 0
#2 102 ENGLISHB 5 0
#3 102 ISIZULUB 7 1




Or with base R



i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
transform(aggregate(SCORE ~ ., df1, max),
flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))


data



df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ 
= c("AFRIKAANSB",
"AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
))





share|improve this answer


























  • Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)

    – CharlotteM
    Dec 31 '18 at 14:31











  • @CharlotteM Without the error messages, it is not clear what the issue

    – akrun
    Dec 31 '18 at 14:32











  • LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.

    – CharlotteM
    Dec 31 '18 at 14:53











  • i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"

    – CharlotteM
    Dec 31 '18 at 14:54











  • @CharlotteM The error is pretty much clear. You have a factor column. Based on the input showed, I assume it as numeric. You may need to convert it to numeric first i.e. LANG2$L2score <- as.numeric(as.character(LANG2$L2score))

    – akrun
    Dec 31 '18 at 14:56



















0














Two stage procedure works better for me as a newbie to R:



remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53986797%2fremoving-duplicates-in-r-based-on-condition%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    May be this helps



    library(tidyverse)
    df1 %>%
    group_by(STUDID, MATSUBJ) %>%
    summarise(SCORE = max(SCORE),
    flagextra = as.integer(!sum(duplicated(MATSUBJ))))
    # A tibble: 3 x 4
    # Groups: STUDID [?]
    # STUDID MATSUBJ SCORE flagextra
    # <int> <chr> <dbl> <int>
    #1 101 AFRIKAANSB 4 0
    #2 102 ENGLISHB 5 0
    #3 102 ISIZULUB 7 1




    Or with base R



    i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
    transform(aggregate(SCORE ~ ., df1, max),
    flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))


    data



    df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ 
    = c("AFRIKAANSB",
    "AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
    4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
    ))





    share|improve this answer


























    • Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)

      – CharlotteM
      Dec 31 '18 at 14:31











    • @CharlotteM Without the error messages, it is not clear what the issue

      – akrun
      Dec 31 '18 at 14:32











    • LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.

      – CharlotteM
      Dec 31 '18 at 14:53











    • i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"

      – CharlotteM
      Dec 31 '18 at 14:54











    • @CharlotteM The error is pretty much clear. You have a factor column. Based on the input showed, I assume it as numeric. You may need to convert it to numeric first i.e. LANG2$L2score <- as.numeric(as.character(LANG2$L2score))

      – akrun
      Dec 31 '18 at 14:56
















    0














    May be this helps



    library(tidyverse)
    df1 %>%
    group_by(STUDID, MATSUBJ) %>%
    summarise(SCORE = max(SCORE),
    flagextra = as.integer(!sum(duplicated(MATSUBJ))))
    # A tibble: 3 x 4
    # Groups: STUDID [?]
    # STUDID MATSUBJ SCORE flagextra
    # <int> <chr> <dbl> <int>
    #1 101 AFRIKAANSB 4 0
    #2 102 ENGLISHB 5 0
    #3 102 ISIZULUB 7 1




    Or with base R



    i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
    transform(aggregate(SCORE ~ ., df1, max),
    flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))


    data



    df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ 
    = c("AFRIKAANSB",
    "AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
    4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
    ))





    share|improve this answer


























    • Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)

      – CharlotteM
      Dec 31 '18 at 14:31











    • @CharlotteM Without the error messages, it is not clear what the issue

      – akrun
      Dec 31 '18 at 14:32











    • LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.

      – CharlotteM
      Dec 31 '18 at 14:53











    • i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"

      – CharlotteM
      Dec 31 '18 at 14:54











    • @CharlotteM The error is pretty much clear. You have a factor column. Based on the input showed, I assume it as numeric. You may need to convert it to numeric first i.e. LANG2$L2score <- as.numeric(as.character(LANG2$L2score))

      – akrun
      Dec 31 '18 at 14:56














    0












    0








    0







    May be this helps



    library(tidyverse)
    df1 %>%
    group_by(STUDID, MATSUBJ) %>%
    summarise(SCORE = max(SCORE),
    flagextra = as.integer(!sum(duplicated(MATSUBJ))))
    # A tibble: 3 x 4
    # Groups: STUDID [?]
    # STUDID MATSUBJ SCORE flagextra
    # <int> <chr> <dbl> <int>
    #1 101 AFRIKAANSB 4 0
    #2 102 ENGLISHB 5 0
    #3 102 ISIZULUB 7 1




    Or with base R



    i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
    transform(aggregate(SCORE ~ ., df1, max),
    flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))


    data



    df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ 
    = c("AFRIKAANSB",
    "AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
    4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
    ))





    share|improve this answer















    May be this helps



    library(tidyverse)
    df1 %>%
    group_by(STUDID, MATSUBJ) %>%
    summarise(SCORE = max(SCORE),
    flagextra = as.integer(!sum(duplicated(MATSUBJ))))
    # A tibble: 3 x 4
    # Groups: STUDID [?]
    # STUDID MATSUBJ SCORE flagextra
    # <int> <chr> <dbl> <int>
    #1 101 AFRIKAANSB 4 0
    #2 102 ENGLISHB 5 0
    #3 102 ISIZULUB 7 1




    Or with base R



    i1 <- !(duplicated(df1[1:2])|duplicated(df1[1:2], fromLast = TRUE))
    transform(aggregate(SCORE ~ ., df1, max),
    flagextra = as.integer(MATSUBJ %in% df1$MATSUBJ[i1]))


    data



    df1 <- structure(list(STUDID = c(101L, 101L, 102L, 102L, 102L), MATSUBJ 
    = c("AFRIKAANSB",
    "AFRIKAANSB", "ENGLISHB", "ISIZULUB", "ENGLISHB"), SCORE = c(1L,
    4L, 2L, 7L, 5L)), class = "data.frame", row.names = c(NA, -5L
    ))






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 31 '18 at 11:40

























    answered Dec 31 '18 at 11:24









    akrunakrun

    406k13197272




    406k13197272













    • Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)

      – CharlotteM
      Dec 31 '18 at 14:31











    • @CharlotteM Without the error messages, it is not clear what the issue

      – akrun
      Dec 31 '18 at 14:32











    • LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.

      – CharlotteM
      Dec 31 '18 at 14:53











    • i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"

      – CharlotteM
      Dec 31 '18 at 14:54











    • @CharlotteM The error is pretty much clear. You have a factor column. Based on the input showed, I assume it as numeric. You may need to convert it to numeric first i.e. LANG2$L2score <- as.numeric(as.character(LANG2$L2score))

      – akrun
      Dec 31 '18 at 14:56



















    • Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)

      – CharlotteM
      Dec 31 '18 at 14:31











    • @CharlotteM Without the error messages, it is not clear what the issue

      – akrun
      Dec 31 '18 at 14:32











    • LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.

      – CharlotteM
      Dec 31 '18 at 14:53











    • i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"

      – CharlotteM
      Dec 31 '18 at 14:54











    • @CharlotteM The error is pretty much clear. You have a factor column. Based on the input showed, I assume it as numeric. You may need to convert it to numeric first i.e. LANG2$L2score <- as.numeric(as.character(LANG2$L2score))

      – akrun
      Dec 31 '18 at 14:56

















    Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)

    – CharlotteM
    Dec 31 '18 at 14:31





    Am still getting errors when try on real data as below (NB Lang2=MATSUBJ . L2score=score)

    – CharlotteM
    Dec 31 '18 at 14:31













    @CharlotteM Without the error messages, it is not clear what the issue

    – akrun
    Dec 31 '18 at 14:32





    @CharlotteM Without the error messages, it is not clear what the issue

    – akrun
    Dec 31 '18 at 14:32













    LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.

    – CharlotteM
    Dec 31 '18 at 14:53





    LANG2 %>% group_by (STUDID,Lang2) %>% summarise(L2score=max(L2score),flagextra-as.integer(!sum(duplicated(Lang2)))) Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.

    – CharlotteM
    Dec 31 '18 at 14:53













    i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"

    – CharlotteM
    Dec 31 '18 at 14:54





    i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform (aggregate (L2score~.,LANG2,max),flagextra=as.integer(Lang2 %>% LANG2$Lang2 [i1])) Error: unexpected symbol in "i1<-!(duplicated (LANG2[1:2]|duplicated (LANG2[1:2],fromLast=TRUE))transform"

    – CharlotteM
    Dec 31 '18 at 14:54













    @CharlotteM The error is pretty much clear. You have a factor column. Based on the input showed, I assume it as numeric. You may need to convert it to numeric first i.e. LANG2$L2score <- as.numeric(as.character(LANG2$L2score))

    – akrun
    Dec 31 '18 at 14:56





    @CharlotteM The error is pretty much clear. You have a factor column. Based on the input showed, I assume it as numeric. You may need to convert it to numeric first i.e. LANG2$L2score <- as.numeric(as.character(LANG2$L2score))

    – akrun
    Dec 31 '18 at 14:56













    0














    Two stage procedure works better for me as a newbie to R:



    remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)






    share|improve this answer




























      0














      Two stage procedure works better for me as a newbie to R:



      remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)






      share|improve this answer


























        0












        0








        0







        Two stage procedure works better for me as a newbie to R:



        remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)






        share|improve this answer













        Two stage procedure works better for me as a newbie to R:



        remove the duplicates caused by subject retakes df<-LANGSEC%>%group_by (STUDID,MATRICSUBJ) %>%top_n(1,SUBJSCORE) #Then flag one of the two subjects causing the remaining duplicates LANGSEC$flagextra<-as.integer(duplicated(LANGSEC$STUDID),LANGSEC$MATRICSUBJ # Then filter for this third language and make new file LANG3<-LANGSEC%>% filter (flagextra==1) #Then remove these from the other file LANG2<-LANGSEC %>% filter (!flagextra==1)







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 6 at 13:31









        CharlotteMCharlotteM

        101




        101






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53986797%2fremoving-duplicates-in-r-based-on-condition%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas