Removing tricky phrase pattern through Regex
I am trying to remove patterns like this:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD Coronel
from a text data like this:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s problems through incendiary debate, insulting one another or even threatening to physically hurt each other.We are currently a divided society. Sad to say,
The pattern of the phrase is that it starts with an upper case latter, there is a "TweetBy" in between, and the last character is a lowercase, followed by an uppercase letter (Not going to remove the uppercase letter). Now, I am having a hard time putting this into regex.
So far I was able to come up with:
[0-9A-Za-z].*Share TweetBy [A-Za-z].{1,50}[a-z].{1,1}[^ ][A-Z].{1,1}
But this removes the following:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s
I only want to remove it until the last author's name, which is usually up to the last lower case character followed by an uppercase character.
Any suggestions or ideas would help.
Thanks
regex
add a comment |
I am trying to remove patterns like this:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD Coronel
from a text data like this:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s problems through incendiary debate, insulting one another or even threatening to physically hurt each other.We are currently a divided society. Sad to say,
The pattern of the phrase is that it starts with an upper case latter, there is a "TweetBy" in between, and the last character is a lowercase, followed by an uppercase letter (Not going to remove the uppercase letter). Now, I am having a hard time putting this into regex.
So far I was able to come up with:
[0-9A-Za-z].*Share TweetBy [A-Za-z].{1,50}[a-z].{1,1}[^ ][A-Z].{1,1}
But this removes the following:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s
I only want to remove it until the last author's name, which is usually up to the last lower case character followed by an uppercase character.
Any suggestions or ideas would help.
Thanks
regex
You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such asTim
? How would you know whether or not that is the actual start of the tweet?
– Tim Biegeleisen
Jan 1 at 3:44
I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.
– chmscrbbrfck
Jan 1 at 4:01
In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex
– chmscrbbrfck
Jan 1 at 4:18
It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.
– JohnyL
Jan 1 at 8:49
add a comment |
I am trying to remove patterns like this:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD Coronel
from a text data like this:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s problems through incendiary debate, insulting one another or even threatening to physically hurt each other.We are currently a divided society. Sad to say,
The pattern of the phrase is that it starts with an upper case latter, there is a "TweetBy" in between, and the last character is a lowercase, followed by an uppercase letter (Not going to remove the uppercase letter). Now, I am having a hard time putting this into regex.
So far I was able to come up with:
[0-9A-Za-z].*Share TweetBy [A-Za-z].{1,50}[a-z].{1,1}[^ ][A-Z].{1,1}
But this removes the following:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s
I only want to remove it until the last author's name, which is usually up to the last lower case character followed by an uppercase character.
Any suggestions or ideas would help.
Thanks
regex
I am trying to remove patterns like this:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD Coronel
from a text data like this:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s problems through incendiary debate, insulting one another or even threatening to physically hurt each other.We are currently a divided society. Sad to say,
The pattern of the phrase is that it starts with an upper case latter, there is a "TweetBy" in between, and the last character is a lowercase, followed by an uppercase letter (Not going to remove the uppercase letter). Now, I am having a hard time putting this into regex.
So far I was able to come up with:
[0-9A-Za-z].*Share TweetBy [A-Za-z].{1,50}[a-z].{1,1}[^ ][A-Z].{1,1}
But this removes the following:
Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s
I only want to remove it until the last author's name, which is usually up to the last lower case character followed by an uppercase character.
Any suggestions or ideas would help.
Thanks
regex
regex
asked Jan 1 at 3:42
chmscrbbrfckchmscrbbrfck
178
178
You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such asTim
? How would you know whether or not that is the actual start of the tweet?
– Tim Biegeleisen
Jan 1 at 3:44
I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.
– chmscrbbrfck
Jan 1 at 4:01
In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex
– chmscrbbrfck
Jan 1 at 4:18
It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.
– JohnyL
Jan 1 at 8:49
add a comment |
You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such asTim
? How would you know whether or not that is the actual start of the tweet?
– Tim Biegeleisen
Jan 1 at 3:44
I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.
– chmscrbbrfck
Jan 1 at 4:01
In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex
– chmscrbbrfck
Jan 1 at 4:18
It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.
– JohnyL
Jan 1 at 8:49
You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such as
Tim
? How would you know whether or not that is the actual start of the tweet?– Tim Biegeleisen
Jan 1 at 3:44
You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such as
Tim
? How would you know whether or not that is the actual start of the tweet?– Tim Biegeleisen
Jan 1 at 3:44
I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.
– chmscrbbrfck
Jan 1 at 4:01
I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.
– chmscrbbrfck
Jan 1 at 4:01
In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex
– chmscrbbrfck
Jan 1 at 4:18
In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex
– chmscrbbrfck
Jan 1 at 4:18
It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.
– JohnyL
Jan 1 at 8:49
It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.
– JohnyL
Jan 1 at 8:49
add a comment |
1 Answer
1
active
oldest
votes
You can use this.
[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])
[0-9A-Za-z]
- Will match word character except_
.
.*?
- Will match anything except newline. ( lazy mode ).
Share TweetBy
- Will matchShare TweetBy
.
[a-z](?=[A-Z)
-[a-z]
will match a Lowercase letter. Positive look-ahead checks for uppercase letter.
Demo
Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"
– chmscrbbrfck
Jan 1 at 8:30
Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.
– chmscrbbrfck
Jan 1 at 8:36
@chmscrbbrfck check now made some changes
– Code Maniac
Jan 1 at 8:37
We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".
– chmscrbbrfck
Jan 1 at 8:42
1
Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha
– chmscrbbrfck
Jan 1 at 8:54
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53992881%2fremoving-tricky-phrase-pattern-through-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use this.
[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])
[0-9A-Za-z]
- Will match word character except_
.
.*?
- Will match anything except newline. ( lazy mode ).
Share TweetBy
- Will matchShare TweetBy
.
[a-z](?=[A-Z)
-[a-z]
will match a Lowercase letter. Positive look-ahead checks for uppercase letter.
Demo
Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"
– chmscrbbrfck
Jan 1 at 8:30
Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.
– chmscrbbrfck
Jan 1 at 8:36
@chmscrbbrfck check now made some changes
– Code Maniac
Jan 1 at 8:37
We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".
– chmscrbbrfck
Jan 1 at 8:42
1
Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha
– chmscrbbrfck
Jan 1 at 8:54
|
show 2 more comments
You can use this.
[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])
[0-9A-Za-z]
- Will match word character except_
.
.*?
- Will match anything except newline. ( lazy mode ).
Share TweetBy
- Will matchShare TweetBy
.
[a-z](?=[A-Z)
-[a-z]
will match a Lowercase letter. Positive look-ahead checks for uppercase letter.
Demo
Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"
– chmscrbbrfck
Jan 1 at 8:30
Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.
– chmscrbbrfck
Jan 1 at 8:36
@chmscrbbrfck check now made some changes
– Code Maniac
Jan 1 at 8:37
We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".
– chmscrbbrfck
Jan 1 at 8:42
1
Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha
– chmscrbbrfck
Jan 1 at 8:54
|
show 2 more comments
You can use this.
[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])
[0-9A-Za-z]
- Will match word character except_
.
.*?
- Will match anything except newline. ( lazy mode ).
Share TweetBy
- Will matchShare TweetBy
.
[a-z](?=[A-Z)
-[a-z]
will match a Lowercase letter. Positive look-ahead checks for uppercase letter.
Demo
You can use this.
[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])
[0-9A-Za-z]
- Will match word character except_
.
.*?
- Will match anything except newline. ( lazy mode ).
Share TweetBy
- Will matchShare TweetBy
.
[a-z](?=[A-Z)
-[a-z]
will match a Lowercase letter. Positive look-ahead checks for uppercase letter.
Demo
edited Jan 1 at 8:50
answered Jan 1 at 7:37
Code ManiacCode Maniac
6,5781226
6,5781226
Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"
– chmscrbbrfck
Jan 1 at 8:30
Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.
– chmscrbbrfck
Jan 1 at 8:36
@chmscrbbrfck check now made some changes
– Code Maniac
Jan 1 at 8:37
We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".
– chmscrbbrfck
Jan 1 at 8:42
1
Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha
– chmscrbbrfck
Jan 1 at 8:54
|
show 2 more comments
Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"
– chmscrbbrfck
Jan 1 at 8:30
Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.
– chmscrbbrfck
Jan 1 at 8:36
@chmscrbbrfck check now made some changes
– Code Maniac
Jan 1 at 8:37
We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".
– chmscrbbrfck
Jan 1 at 8:42
1
Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha
– chmscrbbrfck
Jan 1 at 8:54
Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"
– chmscrbbrfck
Jan 1 at 8:30
Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"
– chmscrbbrfck
Jan 1 at 8:30
Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.
– chmscrbbrfck
Jan 1 at 8:36
Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.
– chmscrbbrfck
Jan 1 at 8:36
@chmscrbbrfck check now made some changes
– Code Maniac
Jan 1 at 8:37
@chmscrbbrfck check now made some changes
– Code Maniac
Jan 1 at 8:37
We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".
– chmscrbbrfck
Jan 1 at 8:42
We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".
– chmscrbbrfck
Jan 1 at 8:42
1
1
Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha
– chmscrbbrfck
Jan 1 at 8:54
Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha
– chmscrbbrfck
Jan 1 at 8:54
|
show 2 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53992881%2fremoving-tricky-phrase-pattern-through-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such as
Tim
? How would you know whether or not that is the actual start of the tweet?– Tim Biegeleisen
Jan 1 at 3:44
I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.
– chmscrbbrfck
Jan 1 at 4:01
In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex
– chmscrbbrfck
Jan 1 at 4:18
It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.
– JohnyL
Jan 1 at 8:49