extracting data before a sign in R [duplicate]
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
This question already has an answer here:
extract a substring in R according to a pattern
6 answers
I need to extract all the text before a sign, in this case a dash.
I have data like these:
text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"
The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these
extract <- substring(text1, 1, n)
desired results are:
extract1 <- "Médicos"
extract2 <- "Disturbio"
extract3 <- "Accidente"
r regex
marked as duplicate by Community♦ Jan 4 at 1:08
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
extract a substring in R according to a pattern
6 answers
I need to extract all the text before a sign, in this case a dash.
I have data like these:
text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"
The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these
extract <- substring(text1, 1, n)
desired results are:
extract1 <- "Médicos"
extract2 <- "Disturbio"
extract3 <- "Accidente"
r regex
marked as duplicate by Community♦ Jan 4 at 1:08
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern
– Henrik
Jan 3 at 20:33
add a comment |
This question already has an answer here:
extract a substring in R according to a pattern
6 answers
I need to extract all the text before a sign, in this case a dash.
I have data like these:
text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"
The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these
extract <- substring(text1, 1, n)
desired results are:
extract1 <- "Médicos"
extract2 <- "Disturbio"
extract3 <- "Accidente"
r regex
This question already has an answer here:
extract a substring in R according to a pattern
6 answers
I need to extract all the text before a sign, in this case a dash.
I have data like these:
text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"
The problem is that the words that i am trying to extract don't have the same lenght so i can't try some of these
extract <- substring(text1, 1, n)
desired results are:
extract1 <- "Médicos"
extract2 <- "Disturbio"
extract3 <- "Accidente"
This question already has an answer here:
extract a substring in R according to a pattern
6 answers
r regex
r regex
edited Jan 10 at 13:20
Julius Vainora
38.4k76786
38.4k76786
asked Jan 3 at 20:21
Armando González DíazArmando González Díaz
6510
6510
marked as duplicate by Community♦ Jan 4 at 1:08
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Community♦ Jan 4 at 1:08
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern
– Henrik
Jan 3 at 20:33
add a comment |
Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern
– Henrik
Jan 3 at 20:33
Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern
– Henrik
Jan 3 at 20:33
Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern
– Henrik
Jan 3 at 20:33
add a comment |
4 Answers
4
active
oldest
votes
Using sub does the job:
sub("(.*)-.*", "\1", c(text1, text2, text3))
# [1] "Médicos" "Disturbio" "Accidente"
Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).
Analogously you may extract the second half:
sub(".*-(.*)", "\1", c(text1, text2, text3))
# [1] "Otros" "Escándalo" "Choque"
Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?
– Armando González Díaz
Jan 3 at 21:00
1
@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), usesub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only{4}to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would besub(".*-.*-.*-.*-(.*)", "\1", txt), if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.
– Julius Vainora
Jan 3 at 21:10
add a comment |
You can use regular expressions:
text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"
extract1 <- gsub("\-.*", "", text1)
extract2 <- gsub("\-.*", "", text2)
extract3 <- gsub("\-.*", "", text3)
This translates to match everything (and including) after dash ("-") and replace with nothing "".
Thank you. Now i need to extract the second part of text ¿How i can do it?
– Armando González Díaz
Jan 3 at 20:43
1
@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbersstrsplit()approach. Eg:do.call(rbind, strsplit(c(text1, text2, text3), "-"))
– AkselA
Jan 3 at 20:50
add a comment |
You can also use strsplit
> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
[1] "Médicos" "Disturbio" "Accidente"
Consider str_extract from stringr package as another alternative
> library(stringr)
> str_extract(c(text1, text2, text3), "\w+")
[1] "Médicos" "Disturbio" "Accidente"
add a comment |
Using regex with positive look-ahead
sapply(c(text1, text2, text3),
function(x)
regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
)
# Médicos-Otros Disturbio-Escándalo Accidente-Choque
# "Médicos" "Disturbio" "Accidente"
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using sub does the job:
sub("(.*)-.*", "\1", c(text1, text2, text3))
# [1] "Médicos" "Disturbio" "Accidente"
Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).
Analogously you may extract the second half:
sub(".*-(.*)", "\1", c(text1, text2, text3))
# [1] "Otros" "Escándalo" "Choque"
Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?
– Armando González Díaz
Jan 3 at 21:00
1
@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), usesub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only{4}to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would besub(".*-.*-.*-.*-(.*)", "\1", txt), if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.
– Julius Vainora
Jan 3 at 21:10
add a comment |
Using sub does the job:
sub("(.*)-.*", "\1", c(text1, text2, text3))
# [1] "Médicos" "Disturbio" "Accidente"
Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).
Analogously you may extract the second half:
sub(".*-(.*)", "\1", c(text1, text2, text3))
# [1] "Otros" "Escándalo" "Choque"
Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?
– Armando González Díaz
Jan 3 at 21:00
1
@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), usesub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only{4}to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would besub(".*-.*-.*-.*-(.*)", "\1", txt), if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.
– Julius Vainora
Jan 3 at 21:10
add a comment |
Using sub does the job:
sub("(.*)-.*", "\1", c(text1, text2, text3))
# [1] "Médicos" "Disturbio" "Accidente"
Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).
Analogously you may extract the second half:
sub(".*-(.*)", "\1", c(text1, text2, text3))
# [1] "Otros" "Escándalo" "Choque"
Using sub does the job:
sub("(.*)-.*", "\1", c(text1, text2, text3))
# [1] "Médicos" "Disturbio" "Accidente"
Here we split each character into: what goes before the dash ((.*)), the dash itself, and what goes after the dash (.*). Each character then is replaced by the first part (\1).
Analogously you may extract the second half:
sub(".*-(.*)", "\1", c(text1, text2, text3))
# [1] "Otros" "Escándalo" "Choque"
edited Jan 3 at 20:45
answered Jan 3 at 20:27
Julius VainoraJulius Vainora
38.4k76786
38.4k76786
Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?
– Armando González Díaz
Jan 3 at 21:00
1
@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), usesub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only{4}to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would besub(".*-.*-.*-.*-(.*)", "\1", txt), if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.
– Julius Vainora
Jan 3 at 21:10
add a comment |
Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?
– Armando González Díaz
Jan 3 at 21:00
1
@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), usesub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only{4}to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would besub(".*-.*-.*-.*-(.*)", "\1", txt), if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.
– Julius Vainora
Jan 3 at 21:10
Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?
– Armando González Díaz
Jan 3 at 21:00
Thank you. One more thing, just in order to understand how this work: if there were a lot of dashes and i need one in particular ¿How can i get the desired part of text?
– Armando González Díaz
Jan 3 at 21:00
1
1
@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use
sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.– Julius Vainora
Jan 3 at 21:10
@ArmandoGonzálezDíaz, to extract, say, the 5th part (after the 4th dash), use
sub("(.*?-){4}(.*?)($|-.*)", "\2", txt), and so on (need to change only {4} to something else). The pattern now is quite different because the total number of dashes is unknown. If you knew that there are four dashes in total, the fifth part would be sub(".*-.*-.*-.*-(.*)", "\1", txt) , if we keep going in the same fashion, but clearly there are more concise ways once the situation gets more complex. For the future keep in mind to make sure that your initial question includes everything.– Julius Vainora
Jan 3 at 21:10
add a comment |
You can use regular expressions:
text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"
extract1 <- gsub("\-.*", "", text1)
extract2 <- gsub("\-.*", "", text2)
extract3 <- gsub("\-.*", "", text3)
This translates to match everything (and including) after dash ("-") and replace with nothing "".
Thank you. Now i need to extract the second part of text ¿How i can do it?
– Armando González Díaz
Jan 3 at 20:43
1
@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbersstrsplit()approach. Eg:do.call(rbind, strsplit(c(text1, text2, text3), "-"))
– AkselA
Jan 3 at 20:50
add a comment |
You can use regular expressions:
text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"
extract1 <- gsub("\-.*", "", text1)
extract2 <- gsub("\-.*", "", text2)
extract3 <- gsub("\-.*", "", text3)
This translates to match everything (and including) after dash ("-") and replace with nothing "".
Thank you. Now i need to extract the second part of text ¿How i can do it?
– Armando González Díaz
Jan 3 at 20:43
1
@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbersstrsplit()approach. Eg:do.call(rbind, strsplit(c(text1, text2, text3), "-"))
– AkselA
Jan 3 at 20:50
add a comment |
You can use regular expressions:
text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"
extract1 <- gsub("\-.*", "", text1)
extract2 <- gsub("\-.*", "", text2)
extract3 <- gsub("\-.*", "", text3)
This translates to match everything (and including) after dash ("-") and replace with nothing "".
You can use regular expressions:
text1 <- "Médicos-Otros"
text2 <- "Disturbio-Escándalo"
text3 <- "Accidente-Choque"
extract1 <- gsub("\-.*", "", text1)
extract2 <- gsub("\-.*", "", text2)
extract3 <- gsub("\-.*", "", text3)
This translates to match everything (and including) after dash ("-") and replace with nothing "".
answered Jan 3 at 20:26
KhaynesKhaynes
727721
727721
Thank you. Now i need to extract the second part of text ¿How i can do it?
– Armando González Díaz
Jan 3 at 20:43
1
@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbersstrsplit()approach. Eg:do.call(rbind, strsplit(c(text1, text2, text3), "-"))
– AkselA
Jan 3 at 20:50
add a comment |
Thank you. Now i need to extract the second part of text ¿How i can do it?
– Armando González Díaz
Jan 3 at 20:43
1
@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbersstrsplit()approach. Eg:do.call(rbind, strsplit(c(text1, text2, text3), "-"))
– AkselA
Jan 3 at 20:50
Thank you. Now i need to extract the second part of text ¿How i can do it?
– Armando González Díaz
Jan 3 at 20:43
Thank you. Now i need to extract the second part of text ¿How i can do it?
– Armando González Díaz
Jan 3 at 20:43
1
1
@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers
strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))– AkselA
Jan 3 at 20:50
@ArmandoGonzálezDíaz: If you're interested in both parts of each string, but having them separate, you're better off with Jilbers
strsplit() approach. Eg: do.call(rbind, strsplit(c(text1, text2, text3), "-"))– AkselA
Jan 3 at 20:50
add a comment |
You can also use strsplit
> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
[1] "Médicos" "Disturbio" "Accidente"
Consider str_extract from stringr package as another alternative
> library(stringr)
> str_extract(c(text1, text2, text3), "\w+")
[1] "Médicos" "Disturbio" "Accidente"
add a comment |
You can also use strsplit
> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
[1] "Médicos" "Disturbio" "Accidente"
Consider str_extract from stringr package as another alternative
> library(stringr)
> str_extract(c(text1, text2, text3), "\w+")
[1] "Médicos" "Disturbio" "Accidente"
add a comment |
You can also use strsplit
> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
[1] "Médicos" "Disturbio" "Accidente"
Consider str_extract from stringr package as another alternative
> library(stringr)
> str_extract(c(text1, text2, text3), "\w+")
[1] "Médicos" "Disturbio" "Accidente"
You can also use strsplit
> sapply(strsplit(c(text1, text2, text3), "-"), "[[", 1)
[1] "Médicos" "Disturbio" "Accidente"
Consider str_extract from stringr package as another alternative
> library(stringr)
> str_extract(c(text1, text2, text3), "\w+")
[1] "Médicos" "Disturbio" "Accidente"
answered Jan 3 at 20:31
Jilber UrbinaJilber Urbina
43.5k483114
43.5k483114
add a comment |
add a comment |
Using regex with positive look-ahead
sapply(c(text1, text2, text3),
function(x)
regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
)
# Médicos-Otros Disturbio-Escándalo Accidente-Choque
# "Médicos" "Disturbio" "Accidente"
add a comment |
Using regex with positive look-ahead
sapply(c(text1, text2, text3),
function(x)
regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
)
# Médicos-Otros Disturbio-Escándalo Accidente-Choque
# "Médicos" "Disturbio" "Accidente"
add a comment |
Using regex with positive look-ahead
sapply(c(text1, text2, text3),
function(x)
regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
)
# Médicos-Otros Disturbio-Escándalo Accidente-Choque
# "Médicos" "Disturbio" "Accidente"
Using regex with positive look-ahead
sapply(c(text1, text2, text3),
function(x)
regmatches(x, regexpr(".*(?=-)", x, perl=TRUE))
)
# Médicos-Otros Disturbio-Escándalo Accidente-Choque
# "Médicos" "Disturbio" "Accidente"
answered Jan 3 at 20:49
AkselAAkselA
4,68421326
4,68421326
add a comment |
add a comment |
Remove part of string after “.”, Get the strings before the comma with R, Extract part of string (till the first semicolon) in R, How to extract everything until first occurrence of pattern
– Henrik
Jan 3 at 20:33