extracting data before a sign in R [duplicate]

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

This question already has an answer here:

extract a substring in R according to a pattern

6 answers

I need to extract all the text before a sign, in this case a dash.
I have data like these:

  text1 <- "Médicos-Otros"

  text2 <- "Disturbio-Escándalo"

  text3 <- "Accidente-Choque"

The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these

extract <- substring(text1, 1, n)

desired results are:

extract1 <- "Médicos"

extract2 <- "Disturbio"

extract3 <- "Accidente"

edited Jan 10 at 13:20

Julius Vainora

38.4k76786

asked Jan 3 at 20:21

Armando González Díaz

6510

marked as duplicate by Community♦ Jan 4 at 1:08

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

– Henrik
Jan 3 at 20:33

add a comment |

This question already has an answer here:

extract a substring in R according to a pattern

6 answers

I need to extract all the text before a sign, in this case a dash.
I have data like these:

  text1 <- "Médicos-Otros"

  text2 <- "Disturbio-Escándalo"

  text3 <- "Accidente-Choque"

The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these

extract <- substring(text1, 1, n)

desired results are:

extract1 <- "Médicos"

extract2 <- "Disturbio"

extract3 <- "Accidente"

edited Jan 10 at 13:20

Julius Vainora

38.4k76786

asked Jan 3 at 20:21

Armando González Díaz

6510

marked as duplicate by Community♦ Jan 4 at 1:08

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

– Henrik
Jan 3 at 20:33

add a comment |

This question already has an answer here:

extract a substring in R according to a pattern

6 answers

I need to extract all the text before a sign, in this case a dash.
I have data like these:

  text1 <- "Médicos-Otros"

  text2 <- "Disturbio-Escándalo"

  text3 <- "Accidente-Choque"

The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these

extract <- substring(text1, 1, n)

desired results are:

extract1 <- "Médicos"

extract2 <- "Disturbio"

extract3 <- "Accidente"

edited Jan 10 at 13:20

Julius Vainora

38.4k76786

asked Jan 3 at 20:21

Armando González Díaz

6510

This question already has an answer here:

extract a substring in R according to a pattern

6 answers

I need to extract all the text before a sign, in this case a dash.
I have data like these:

  text1 <- "Médicos-Otros"

  text2 <- "Disturbio-Escándalo"

  text3 <- "Accidente-Choque"

The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these

extract <- substring(text1, 1, n)

desired results are:

extract1 <- "Médicos"

extract2 <- "Disturbio"

extract3 <- "Accidente"

This question already has an answer here:

extract a substring in R according to a pattern

6 answers

r regex

edited Jan 10 at 13:20

Julius Vainora

38.4k76786

asked Jan 3 at 20:21

Armando González Díaz

6510

edited Jan 10 at 13:20

Julius Vainora

38.4k76786

asked Jan 3 at 20:21

Armando González Díaz

6510

edited Jan 10 at 13:20

Julius Vainora

38.4k76786

edited Jan 10 at 13:20

Julius Vainora

38.4k76786

edited Jan 10 at 13:20

Julius Vainora

38.4k76786

asked Jan 3 at 20:21

Armando González Díaz

6510

asked Jan 3 at 20:21

Armando González Díaz

6510

asked Jan 3 at 20:21

Armando González Díaz

6510

marked as duplicate by Community♦ Jan 4 at 1:08

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by Community♦ Jan 4 at 1:08

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

– Henrik
Jan 3 at 20:33

add a comment |

Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

– Henrik
Jan 3 at 20:33

Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern

– Henrik
Jan 3 at 20:33

add a comment |

4 Answers
4

active

oldest

votes

Using sub does the job:

sub("(.*)-.*", "\1", c(text1, text2, text3))

# [1] "Médicos"   "Disturbio" "Accidente"

Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).

Analogously you may extract the second half:

sub(".*-(.*)", "\1", c(text1, text2, text3))

# [1] "Otros"     "Escándalo" "Choque"

edited Jan 3 at 20:45

answered Jan 3 at 20:27

Julius Vainora

38.4k76786

Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

– Armando González Díaz
Jan 3 at 21:00

1

@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

– Julius Vainora
Jan 3 at 21:10

add a comment |

You can use regular expressions:

text1 <-  "Médicos-Otros"

text2 <-  "Disturbio-Escándalo"

text3 <-  "Accidente-Choque"



extract1 <- gsub("\-.*", "", text1)

extract2 <- gsub("\-.*", "", text2)

extract3 <- gsub("\-.*", "", text3)

This translates to match everything (and including) after dash ("-") and replace with nothing "".

answered Jan 3 at 20:26

Khaynes

727721

Thank you. Now i need to extract the second part of text ¿How i can do it?

– Armando González Díaz
Jan 3 at 20:43

1

@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

– AkselA
Jan 3 at 20:50

add a comment |

You can also use strsplit

> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)

[1] "Médicos"   "Disturbio" "Accidente"

Consider str_extract from stringr package as another alternative

> library(stringr)

> str_extract(c(text1, text2, text3), "\w+")

[1] "Médicos"   "Disturbio" "Accidente"

answered Jan 3 at 20:31

Jilber Urbina

43.5k483114

add a comment |

Using regex with positive look-ahead

sapply(c(text1, text2, text3), 

  function(x)

    regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))

)

#      Médicos-Otros Disturbio-Escándalo    Accidente-Choque 

#          "Médicos"         "Disturbio"         "Accidente"

answered Jan 3 at 20:49

AkselA

4,68421326

add a comment |

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Using sub does the job:

sub("(.*)-.*", "\1", c(text1, text2, text3))

# [1] "Médicos"   "Disturbio" "Accidente"

Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).

Analogously you may extract the second half:

sub(".*-(.*)", "\1", c(text1, text2, text3))

# [1] "Otros"     "Escándalo" "Choque"

edited Jan 3 at 20:45

answered Jan 3 at 20:27

Julius Vainora

38.4k76786

Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

– Armando González Díaz
Jan 3 at 21:00

1

@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

– Julius Vainora
Jan 3 at 21:10

add a comment |

Using sub does the job:

sub("(.*)-.*", "\1", c(text1, text2, text3))

# [1] "Médicos"   "Disturbio" "Accidente"

Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).

Analogously you may extract the second half:

sub(".*-(.*)", "\1", c(text1, text2, text3))

# [1] "Otros"     "Escándalo" "Choque"

edited Jan 3 at 20:45

answered Jan 3 at 20:27

Julius Vainora

38.4k76786

Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

– Armando González Díaz
Jan 3 at 21:00

1

@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

– Julius Vainora
Jan 3 at 21:10

add a comment |

Using sub does the job:

sub("(.*)-.*", "\1", c(text1, text2, text3))

# [1] "Médicos"   "Disturbio" "Accidente"

Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).

Analogously you may extract the second half:

sub(".*-(.*)", "\1", c(text1, text2, text3))

# [1] "Otros"     "Escándalo" "Choque"

edited Jan 3 at 20:45

answered Jan 3 at 20:27

Julius Vainora

38.4k76786

Using sub does the job:

sub("(.*)-.*", "\1", c(text1, text2, text3))

# [1] "Médicos"   "Disturbio" "Accidente"

Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).

Analogously you may extract the second half:

sub(".*-(.*)", "\1", c(text1, text2, text3))

# [1] "Otros"     "Escándalo" "Choque"

edited Jan 3 at 20:45

answered Jan 3 at 20:27

Julius Vainora

38.4k76786

edited Jan 3 at 20:45

answered Jan 3 at 20:27

Julius Vainora

38.4k76786

answered Jan 3 at 20:27

Julius Vainora

38.4k76786

answered Jan 3 at 20:27

Julius Vainora

38.4k76786

Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

– Armando González Díaz
Jan 3 at 21:00

1

@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

– Julius Vainora
Jan 3 at 21:10

add a comment |

Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

– Armando González Díaz
Jan 3 at 21:00

1

@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

– Julius Vainora
Jan 3 at 21:10

Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?

– Armando González Díaz
Jan 3 at 21:00

@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.

– Julius Vainora
Jan 3 at 21:10

add a comment |

You can use regular expressions:

text1 <-  "Médicos-Otros"

text2 <-  "Disturbio-Escándalo"

text3 <-  "Accidente-Choque"



extract1 <- gsub("\-.*", "", text1)

extract2 <- gsub("\-.*", "", text2)

extract3 <- gsub("\-.*", "", text3)

This translates to match everything (and including) after dash ("-") and replace with nothing "".

answered Jan 3 at 20:26

Khaynes

727721

Thank you. Now i need to extract the second part of text ¿How i can do it?

– Armando González Díaz
Jan 3 at 20:43

1

@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

– AkselA
Jan 3 at 20:50

add a comment |

You can use regular expressions:

text1 <-  "Médicos-Otros"

text2 <-  "Disturbio-Escándalo"

text3 <-  "Accidente-Choque"



extract1 <- gsub("\-.*", "", text1)

extract2 <- gsub("\-.*", "", text2)

extract3 <- gsub("\-.*", "", text3)

This translates to match everything (and including) after dash ("-") and replace with nothing "".

answered Jan 3 at 20:26

Khaynes

727721

Thank you. Now i need to extract the second part of text ¿How i can do it?

– Armando González Díaz
Jan 3 at 20:43

1

@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

– AkselA
Jan 3 at 20:50

add a comment |

You can use regular expressions:

text1 <-  "Médicos-Otros"

text2 <-  "Disturbio-Escándalo"

text3 <-  "Accidente-Choque"



extract1 <- gsub("\-.*", "", text1)

extract2 <- gsub("\-.*", "", text2)

extract3 <- gsub("\-.*", "", text3)

This translates to match everything (and including) after dash ("-") and replace with nothing "".

answered Jan 3 at 20:26

Khaynes

727721

You can use regular expressions:

text1 <-  "Médicos-Otros"

text2 <-  "Disturbio-Escándalo"

text3 <-  "Accidente-Choque"



extract1 <- gsub("\-.*", "", text1)

extract2 <- gsub("\-.*", "", text2)

extract3 <- gsub("\-.*", "", text3)

This translates to match everything (and including) after dash ("-") and replace with nothing "".

answered Jan 3 at 20:26

Khaynes

727721

answered Jan 3 at 20:26

Khaynes

727721

answered Jan 3 at 20:26

Khaynes

727721

answered Jan 3 at 20:26

Khaynes

727721

Thank you. Now i need to extract the second part of text ¿How i can do it?

– Armando González Díaz
Jan 3 at 20:43

1

@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

– AkselA
Jan 3 at 20:50

add a comment |

Thank you. Now i need to extract the second part of text ¿How i can do it?

– Armando González Díaz
Jan 3 at 20:43

1

@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

– AkselA
Jan 3 at 20:50

Thank you. Now i need to extract the second part of text ¿How i can do it?

– Armando González Díaz
Jan 3 at 20:43

@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))

– AkselA
Jan 3 at 20:50

add a comment |

You can also use strsplit

> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)

[1] "Médicos"   "Disturbio" "Accidente"

Consider str_extract from stringr package as another alternative

> library(stringr)

> str_extract(c(text1, text2, text3), "\w+")

[1] "Médicos"   "Disturbio" "Accidente"

answered Jan 3 at 20:31

Jilber Urbina

43.5k483114

add a comment |

You can also use strsplit

> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)

[1] "Médicos"   "Disturbio" "Accidente"

Consider str_extract from stringr package as another alternative

> library(stringr)

> str_extract(c(text1, text2, text3), "\w+")

[1] "Médicos"   "Disturbio" "Accidente"

answered Jan 3 at 20:31

Jilber Urbina

43.5k483114

add a comment |

You can also use strsplit

> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)

[1] "Médicos"   "Disturbio" "Accidente"

Consider str_extract from stringr package as another alternative

> library(stringr)

> str_extract(c(text1, text2, text3), "\w+")

[1] "Médicos"   "Disturbio" "Accidente"

answered Jan 3 at 20:31

Jilber Urbina

43.5k483114

You can also use strsplit

> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)

[1] "Médicos"   "Disturbio" "Accidente"

Consider str_extract from stringr package as another alternative

> library(stringr)

> str_extract(c(text1, text2, text3), "\w+")

[1] "Médicos"   "Disturbio" "Accidente"

answered Jan 3 at 20:31

Jilber Urbina

43.5k483114

answered Jan 3 at 20:31

Jilber Urbina

43.5k483114

answered Jan 3 at 20:31

Jilber Urbina

43.5k483114

answered Jan 3 at 20:31

Jilber Urbina

43.5k483114

add a comment |

Using regex with positive look-ahead

sapply(c(text1, text2, text3), 

  function(x)

    regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))

)

#      Médicos-Otros Disturbio-Escándalo    Accidente-Choque 

#          "Médicos"         "Disturbio"         "Accidente"

answered Jan 3 at 20:49

AkselA

4,68421326

add a comment |

Using regex with positive look-ahead

sapply(c(text1, text2, text3), 

  function(x)

    regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))

)

#      Médicos-Otros Disturbio-Escándalo    Accidente-Choque 

#          "Médicos"         "Disturbio"         "Accidente"

answered Jan 3 at 20:49

AkselA

4,68421326

add a comment |

Using regex with positive look-ahead

sapply(c(text1, text2, text3), 

  function(x)

    regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))

)

#      Médicos-Otros Disturbio-Escándalo    Accidente-Choque 

#          "Médicos"         "Disturbio"         "Accidente"

answered Jan 3 at 20:49

AkselA

4,68421326

Using regex with positive look-ahead

sapply(c(text1, text2, text3), 

  function(x)

    regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))

)

#      Médicos-Otros Disturbio-Escándalo    Accidente-Choque 

#          "Médicos"         "Disturbio"         "Accidente"

answered Jan 3 at 20:49

AkselA

4,68421326

answered Jan 3 at 20:49

AkselA

4,68421326

answered Jan 3 at 20:49

AkselA

4,68421326

answered Jan 3 at 20:49

AkselA

4,68421326

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk