extracting data before a sign in R [duplicate]





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1
















This question already has an answer here:




  • extract a substring in R according to a pattern

    6 answers




I need to extract all the text before a sign, in this case a dash.
I have data like these:



  text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"


The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these



extract <- substring(text1, 1, n)


desired results are:



extract1 <- "Médicos"
extract2 <- "Disturbio"
extract3 <- "Accidente"









share|improve this question















marked as duplicate by Community Jan 4 at 1:08


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

    – Henrik
    Jan 3 at 20:33


















1
















This question already has an answer here:




  • extract a substring in R according to a pattern

    6 answers




I need to extract all the text before a sign, in this case a dash.
I have data like these:



  text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"


The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these



extract <- substring(text1, 1, n)


desired results are:



extract1 <- "Médicos"
extract2 <- "Disturbio"
extract3 <- "Accidente"









share|improve this question















marked as duplicate by Community Jan 4 at 1:08


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

    – Henrik
    Jan 3 at 20:33














1












1








1









This question already has an answer here:




  • extract a substring in R according to a pattern

    6 answers




I need to extract all the text before a sign, in this case a dash.
I have data like these:



  text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"


The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these



extract <- substring(text1, 1, n)


desired results are:



extract1 <- "Médicos"
extract2 <- "Disturbio"
extract3 <- "Accidente"









share|improve this question

















This question already has an answer here:




  • extract a substring in R according to a pattern

    6 answers




I need to extract all the text before a sign, in this case a dash.
I have data like these:



  text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"


The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these



extract <- substring(text1, 1, n)


desired results are:



extract1 <- "Médicos"
extract2 <- "Disturbio"
extract3 <- "Accidente"




This question already has an answer here:




  • extract a substring in R according to a pattern

    6 answers








r regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 10 at 13:20









Julius Vainora

38.4k76786




38.4k76786










asked Jan 3 at 20:21









Armando González DíazArmando González Díaz

6510




6510




marked as duplicate by Community Jan 4 at 1:08


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by Community Jan 4 at 1:08


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

    – Henrik
    Jan 3 at 20:33



















  • Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

    – Henrik
    Jan 3 at 20:33

















Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

– Henrik
Jan 3 at 20:33





Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

– Henrik
Jan 3 at 20:33












4 Answers
4






active

oldest

votes


















1














Using sub does the job:



sub("(.*)-.*", "\1", c(text1, text2, text3))
# [1] "Médicos" "Disturbio" "Accidente"


Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).



Analogously you may extract the second half:



sub(".*-(.*)", "\1", c(text1, text2, text3))
# [1] "Otros" "Escándalo" "Choque"





share|improve this answer


























  • Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

    – Armando González Díaz
    Jan 3 at 21:00








  • 1





    @ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

    – Julius Vainora
    Jan 3 at 21:10





















1














You can use regular expressions:



text1 <-  "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"

extract1 <- gsub("\-.*", "", text1)
extract2 <- gsub("\-.*", "", text2)
extract3 <- gsub("\-.*", "", text3)


This translates to match everything (and including) after dash ("-") and replace with nothing "".






share|improve this answer
























  • Thank you. Now i need to extract the second part of text ¿How i can do it?

    – Armando González Díaz
    Jan 3 at 20:43






  • 1





    @ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

    – AkselA
    Jan 3 at 20:50





















1














You can also use strsplit



> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
[1] "Médicos" "Disturbio" "Accidente"


Consider str_extract from stringr package as another alternative



> library(stringr)
> str_extract(c(text1, text2, text3), "\w+")
[1] "Médicos" "Disturbio" "Accidente"





share|improve this answer































    0














    Using regex with positive look-ahead



    sapply(c(text1, text2, text3), 
    function(x)
    regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
    )
    # Médicos-Otros Disturbio-Escándalo Accidente-Choque
    # "Médicos" "Disturbio" "Accidente"





    share|improve this answer






























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      Using sub does the job:



      sub("(.*)-.*", "\1", c(text1, text2, text3))
      # [1] "Médicos" "Disturbio" "Accidente"


      Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).



      Analogously you may extract the second half:



      sub(".*-(.*)", "\1", c(text1, text2, text3))
      # [1] "Otros" "Escándalo" "Choque"





      share|improve this answer


























      • Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

        – Armando González Díaz
        Jan 3 at 21:00








      • 1





        @ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

        – Julius Vainora
        Jan 3 at 21:10


















      1














      Using sub does the job:



      sub("(.*)-.*", "\1", c(text1, text2, text3))
      # [1] "Médicos" "Disturbio" "Accidente"


      Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).



      Analogously you may extract the second half:



      sub(".*-(.*)", "\1", c(text1, text2, text3))
      # [1] "Otros" "Escándalo" "Choque"





      share|improve this answer


























      • Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

        – Armando González Díaz
        Jan 3 at 21:00








      • 1





        @ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

        – Julius Vainora
        Jan 3 at 21:10
















      1












      1








      1







      Using sub does the job:



      sub("(.*)-.*", "\1", c(text1, text2, text3))
      # [1] "Médicos" "Disturbio" "Accidente"


      Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).



      Analogously you may extract the second half:



      sub(".*-(.*)", "\1", c(text1, text2, text3))
      # [1] "Otros" "Escándalo" "Choque"





      share|improve this answer















      Using sub does the job:



      sub("(.*)-.*", "\1", c(text1, text2, text3))
      # [1] "Médicos" "Disturbio" "Accidente"


      Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).



      Analogously you may extract the second half:



      sub(".*-(.*)", "\1", c(text1, text2, text3))
      # [1] "Otros" "Escándalo" "Choque"






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 3 at 20:45

























      answered Jan 3 at 20:27









      Julius VainoraJulius Vainora

      38.4k76786




      38.4k76786













      • Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

        – Armando González Díaz
        Jan 3 at 21:00








      • 1





        @ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

        – Julius Vainora
        Jan 3 at 21:10





















      • Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

        – Armando González Díaz
        Jan 3 at 21:00








      • 1





        @ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

        – Julius Vainora
        Jan 3 at 21:10



















      Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

      – Armando González Díaz
      Jan 3 at 21:00







      Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

      – Armando González Díaz
      Jan 3 at 21:00






      1




      1





      @ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

      – Julius Vainora
      Jan 3 at 21:10







      @ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

      – Julius Vainora
      Jan 3 at 21:10















      1














      You can use regular expressions:



      text1 <-  "Médicos-Otros"
      text2 <- "Disturbio-Escándalo"
      text3 <- "Accidente-Choque"

      extract1 <- gsub("\-.*", "", text1)
      extract2 <- gsub("\-.*", "", text2)
      extract3 <- gsub("\-.*", "", text3)


      This translates to match everything (and including) after dash ("-") and replace with nothing "".






      share|improve this answer
























      • Thank you. Now i need to extract the second part of text ¿How i can do it?

        – Armando González Díaz
        Jan 3 at 20:43






      • 1





        @ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

        – AkselA
        Jan 3 at 20:50


















      1














      You can use regular expressions:



      text1 <-  "Médicos-Otros"
      text2 <- "Disturbio-Escándalo"
      text3 <- "Accidente-Choque"

      extract1 <- gsub("\-.*", "", text1)
      extract2 <- gsub("\-.*", "", text2)
      extract3 <- gsub("\-.*", "", text3)


      This translates to match everything (and including) after dash ("-") and replace with nothing "".






      share|improve this answer
























      • Thank you. Now i need to extract the second part of text ¿How i can do it?

        – Armando González Díaz
        Jan 3 at 20:43






      • 1





        @ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

        – AkselA
        Jan 3 at 20:50
















      1












      1








      1







      You can use regular expressions:



      text1 <-  "Médicos-Otros"
      text2 <- "Disturbio-Escándalo"
      text3 <- "Accidente-Choque"

      extract1 <- gsub("\-.*", "", text1)
      extract2 <- gsub("\-.*", "", text2)
      extract3 <- gsub("\-.*", "", text3)


      This translates to match everything (and including) after dash ("-") and replace with nothing "".






      share|improve this answer













      You can use regular expressions:



      text1 <-  "Médicos-Otros"
      text2 <- "Disturbio-Escándalo"
      text3 <- "Accidente-Choque"

      extract1 <- gsub("\-.*", "", text1)
      extract2 <- gsub("\-.*", "", text2)
      extract3 <- gsub("\-.*", "", text3)


      This translates to match everything (and including) after dash ("-") and replace with nothing "".







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Jan 3 at 20:26









      KhaynesKhaynes

      727721




      727721













      • Thank you. Now i need to extract the second part of text ¿How i can do it?

        – Armando González Díaz
        Jan 3 at 20:43






      • 1





        @ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

        – AkselA
        Jan 3 at 20:50





















      • Thank you. Now i need to extract the second part of text ¿How i can do it?

        – Armando González Díaz
        Jan 3 at 20:43






      • 1





        @ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

        – AkselA
        Jan 3 at 20:50



















      Thank you. Now i need to extract the second part of text ¿How i can do it?

      – Armando González Díaz
      Jan 3 at 20:43





      Thank you. Now i need to extract the second part of text ¿How i can do it?

      – Armando González Díaz
      Jan 3 at 20:43




      1




      1





      @ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

      – AkselA
      Jan 3 at 20:50







      @ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

      – AkselA
      Jan 3 at 20:50













      1














      You can also use strsplit



      > sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
      [1] "Médicos" "Disturbio" "Accidente"


      Consider str_extract from stringr package as another alternative



      > library(stringr)
      > str_extract(c(text1, text2, text3), "\w+")
      [1] "Médicos" "Disturbio" "Accidente"





      share|improve this answer




























        1














        You can also use strsplit



        > sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
        [1] "Médicos" "Disturbio" "Accidente"


        Consider str_extract from stringr package as another alternative



        > library(stringr)
        > str_extract(c(text1, text2, text3), "\w+")
        [1] "Médicos" "Disturbio" "Accidente"





        share|improve this answer


























          1












          1








          1







          You can also use strsplit



          > sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
          [1] "Médicos" "Disturbio" "Accidente"


          Consider str_extract from stringr package as another alternative



          > library(stringr)
          > str_extract(c(text1, text2, text3), "\w+")
          [1] "Médicos" "Disturbio" "Accidente"





          share|improve this answer













          You can also use strsplit



          > sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
          [1] "Médicos" "Disturbio" "Accidente"


          Consider str_extract from stringr package as another alternative



          > library(stringr)
          > str_extract(c(text1, text2, text3), "\w+")
          [1] "Médicos" "Disturbio" "Accidente"






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 3 at 20:31









          Jilber UrbinaJilber Urbina

          43.5k483114




          43.5k483114























              0














              Using regex with positive look-ahead



              sapply(c(text1, text2, text3), 
              function(x)
              regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
              )
              # Médicos-Otros Disturbio-Escándalo Accidente-Choque
              # "Médicos" "Disturbio" "Accidente"





              share|improve this answer




























                0














                Using regex with positive look-ahead



                sapply(c(text1, text2, text3), 
                function(x)
                regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
                )
                # Médicos-Otros Disturbio-Escándalo Accidente-Choque
                # "Médicos" "Disturbio" "Accidente"





                share|improve this answer


























                  0












                  0








                  0







                  Using regex with positive look-ahead



                  sapply(c(text1, text2, text3), 
                  function(x)
                  regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
                  )
                  # Médicos-Otros Disturbio-Escándalo Accidente-Choque
                  # "Médicos" "Disturbio" "Accidente"





                  share|improve this answer













                  Using regex with positive look-ahead



                  sapply(c(text1, text2, text3), 
                  function(x)
                  regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
                  )
                  # Médicos-Otros Disturbio-Escándalo Accidente-Choque
                  # "Médicos" "Disturbio" "Accidente"






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 3 at 20:49









                  AkselAAkselA

                  4,68421326




                  4,68421326















                      Popular posts from this blog

                      Mossoró

                      Cannot access a disposed object : DataContext

                      Can't read property showImagePicker of undefined in react native iOS