Removing tricky phrase pattern through Regex












0















I am trying to remove patterns like this:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD Coronel




from a text data like this:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s problems through incendiary debate, insulting one another or even threatening to physically hurt each other.We are currently a divided society. Sad to say,




The pattern of the phrase is that it starts with an upper case latter, there is a "TweetBy" in between, and the last character is a lowercase, followed by an uppercase letter (Not going to remove the uppercase letter). Now, I am having a hard time putting this into regex.



So far I was able to come up with:



[0-9A-Za-z].*Share TweetBy [A-Za-z].{1,50}[a-z].{1,1}[^ ][A-Z].{1,1}


But this removes the following:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s




I only want to remove it until the last author's name, which is usually up to the last lower case character followed by an uppercase character.



Any suggestions or ideas would help.



Thanks










share|improve this question























  • You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such as Tim? How would you know whether or not that is the actual start of the tweet?

    – Tim Biegeleisen
    Jan 1 at 3:44













  • I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.

    – chmscrbbrfck
    Jan 1 at 4:01











  • In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex

    – chmscrbbrfck
    Jan 1 at 4:18











  • It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.

    – JohnyL
    Jan 1 at 8:49
















0















I am trying to remove patterns like this:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD Coronel




from a text data like this:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s problems through incendiary debate, insulting one another or even threatening to physically hurt each other.We are currently a divided society. Sad to say,




The pattern of the phrase is that it starts with an upper case latter, there is a "TweetBy" in between, and the last character is a lowercase, followed by an uppercase letter (Not going to remove the uppercase letter). Now, I am having a hard time putting this into regex.



So far I was able to come up with:



[0-9A-Za-z].*Share TweetBy [A-Za-z].{1,50}[a-z].{1,1}[^ ][A-Z].{1,1}


But this removes the following:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s




I only want to remove it until the last author's name, which is usually up to the last lower case character followed by an uppercase character.



Any suggestions or ideas would help.



Thanks










share|improve this question























  • You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such as Tim? How would you know whether or not that is the actual start of the tweet?

    – Tim Biegeleisen
    Jan 1 at 3:44













  • I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.

    – chmscrbbrfck
    Jan 1 at 4:01











  • In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex

    – chmscrbbrfck
    Jan 1 at 4:18











  • It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.

    – JohnyL
    Jan 1 at 8:49














0












0








0








I am trying to remove patterns like this:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD Coronel




from a text data like this:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s problems through incendiary debate, insulting one another or even threatening to physically hurt each other.We are currently a divided society. Sad to say,




The pattern of the phrase is that it starts with an upper case latter, there is a "TweetBy" in between, and the last character is a lowercase, followed by an uppercase letter (Not going to remove the uppercase letter). Now, I am having a hard time putting this into regex.



So far I was able to come up with:



[0-9A-Za-z].*Share TweetBy [A-Za-z].{1,50}[a-z].{1,1}[^ ][A-Z].{1,1}


But this removes the following:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s




I only want to remove it until the last author's name, which is usually up to the last lower case character followed by an uppercase character.



Any suggestions or ideas would help.



Thanks










share|improve this question














I am trying to remove patterns like this:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD Coronel




from a text data like this:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s problems through incendiary debate, insulting one another or even threatening to physically hurt each other.We are currently a divided society. Sad to say,




The pattern of the phrase is that it starts with an upper case latter, there is a "TweetBy" in between, and the last character is a lowercase, followed by an uppercase letter (Not going to remove the uppercase letter). Now, I am having a hard time putting this into regex.



So far I was able to come up with:



[0-9A-Za-z].*Share TweetBy [A-Za-z].{1,50}[a-z].{1,1}[^ ][A-Z].{1,1}


But this removes the following:




Need for a reset 0 SHARES Share it! Share TweetBy Leandro DD CoronelWe Filipinos can’t solve our nation’s




I only want to remove it until the last author's name, which is usually up to the last lower case character followed by an uppercase character.



Any suggestions or ideas would help.



Thanks







regex






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 1 at 3:42









chmscrbbrfckchmscrbbrfck

178




178













  • You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such as Tim? How would you know whether or not that is the actual start of the tweet?

    – Tim Biegeleisen
    Jan 1 at 3:44













  • I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.

    – chmscrbbrfck
    Jan 1 at 4:01











  • In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex

    – chmscrbbrfck
    Jan 1 at 4:18











  • It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.

    – JohnyL
    Jan 1 at 8:49



















  • You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such as Tim? How would you know whether or not that is the actual start of the tweet?

    – Tim Biegeleisen
    Jan 1 at 3:44













  • I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.

    – chmscrbbrfck
    Jan 1 at 4:01











  • In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex

    – chmscrbbrfck
    Jan 1 at 4:18











  • It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.

    – JohnyL
    Jan 1 at 8:49

















You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such as Tim? How would you know whether or not that is the actual start of the tweet?

– Tim Biegeleisen
Jan 1 at 3:44







You need more specific rules than this to make it work. E.g. what happens if a proper name appears in the tweet, such as Tim? How would you know whether or not that is the actual start of the tweet?

– Tim Biegeleisen
Jan 1 at 3:44















I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.

– chmscrbbrfck
Jan 1 at 4:01





I am not sure if you can write a regex that could recognize a proper name. But the author names usually ends at the last lowercase character immediately followed by an upper case character (no space). So this is the rule that I need to put into the regex.

– chmscrbbrfck
Jan 1 at 4:01













In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex

– chmscrbbrfck
Jan 1 at 4:18





In the case of my current data. It is ALWAYS the case thus, that is the exact rule I need for my regex

– chmscrbbrfck
Jan 1 at 4:18













It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.

– JohnyL
Jan 1 at 8:49





It's not clear what you want. You should show two examples: what you have and what you want to get. Without it it's only guessing.

– JohnyL
Jan 1 at 8:49












1 Answer
1






active

oldest

votes


















1














You can use this.



[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])




  • [0-9A-Za-z] - Will match word character except _.


  • .*? - Will match anything except newline. ( lazy mode ).


  • Share TweetBy - Will match Share TweetBy.


  • [a-z](?=[A-Z) - [a-z] will match a Lowercase letter. Positive look-ahead checks for uppercase letter.


Demo






share|improve this answer


























  • Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"

    – chmscrbbrfck
    Jan 1 at 8:30











  • Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.

    – chmscrbbrfck
    Jan 1 at 8:36













  • @chmscrbbrfck check now made some changes

    – Code Maniac
    Jan 1 at 8:37











  • We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".

    – chmscrbbrfck
    Jan 1 at 8:42






  • 1





    Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha

    – chmscrbbrfck
    Jan 1 at 8:54











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53992881%2fremoving-tricky-phrase-pattern-through-regex%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You can use this.



[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])




  • [0-9A-Za-z] - Will match word character except _.


  • .*? - Will match anything except newline. ( lazy mode ).


  • Share TweetBy - Will match Share TweetBy.


  • [a-z](?=[A-Z) - [a-z] will match a Lowercase letter. Positive look-ahead checks for uppercase letter.


Demo






share|improve this answer


























  • Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"

    – chmscrbbrfck
    Jan 1 at 8:30











  • Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.

    – chmscrbbrfck
    Jan 1 at 8:36













  • @chmscrbbrfck check now made some changes

    – Code Maniac
    Jan 1 at 8:37











  • We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".

    – chmscrbbrfck
    Jan 1 at 8:42






  • 1





    Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha

    – chmscrbbrfck
    Jan 1 at 8:54
















1














You can use this.



[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])




  • [0-9A-Za-z] - Will match word character except _.


  • .*? - Will match anything except newline. ( lazy mode ).


  • Share TweetBy - Will match Share TweetBy.


  • [a-z](?=[A-Z) - [a-z] will match a Lowercase letter. Positive look-ahead checks for uppercase letter.


Demo






share|improve this answer


























  • Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"

    – chmscrbbrfck
    Jan 1 at 8:30











  • Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.

    – chmscrbbrfck
    Jan 1 at 8:36













  • @chmscrbbrfck check now made some changes

    – Code Maniac
    Jan 1 at 8:37











  • We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".

    – chmscrbbrfck
    Jan 1 at 8:42






  • 1





    Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha

    – chmscrbbrfck
    Jan 1 at 8:54














1












1








1







You can use this.



[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])




  • [0-9A-Za-z] - Will match word character except _.


  • .*? - Will match anything except newline. ( lazy mode ).


  • Share TweetBy - Will match Share TweetBy.


  • [a-z](?=[A-Z) - [a-z] will match a Lowercase letter. Positive look-ahead checks for uppercase letter.


Demo






share|improve this answer















You can use this.



[0-9A-Za-z].*?Share TweetBy.*?[a-z](?=[A-Z])




  • [0-9A-Za-z] - Will match word character except _.


  • .*? - Will match anything except newline. ( lazy mode ).


  • Share TweetBy - Will match Share TweetBy.


  • [a-z](?=[A-Z) - [a-z] will match a Lowercase letter. Positive look-ahead checks for uppercase letter.


Demo







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 1 at 8:50

























answered Jan 1 at 7:37









Code ManiacCode Maniac

6,5781226




6,5781226













  • Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"

    – chmscrbbrfck
    Jan 1 at 8:30











  • Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.

    – chmscrbbrfck
    Jan 1 at 8:36













  • @chmscrbbrfck check now made some changes

    – Code Maniac
    Jan 1 at 8:37











  • We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".

    – chmscrbbrfck
    Jan 1 at 8:42






  • 1





    Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha

    – chmscrbbrfck
    Jan 1 at 8:54



















  • Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"

    – chmscrbbrfck
    Jan 1 at 8:30











  • Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.

    – chmscrbbrfck
    Jan 1 at 8:36













  • @chmscrbbrfck check now made some changes

    – Code Maniac
    Jan 1 at 8:37











  • We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".

    – chmscrbbrfck
    Jan 1 at 8:42






  • 1





    Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha

    – chmscrbbrfck
    Jan 1 at 8:54

















Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"

– chmscrbbrfck
Jan 1 at 8:30





Still does not work :( I am having a hard time figuring how to write in regex how to stop the pattern matching on the first time a lowercase character is followed by an uppercase character. In my example this is "lW" in the string "CoronelWe". so the regex should stop at "Coronel"

– chmscrbbrfck
Jan 1 at 8:30













Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.

– chmscrbbrfck
Jan 1 at 8:36







Checked it earlier. The pattern should stop at "Coronel". The regex captures until "We Filipinos", which should not. The rule should be first lower case character immediately followed by an upper case character (no space in between). So in the example's case, it should be the "lW" in the string "CoronelWe", so the regex I am thinking of should capture only until "Coronel". Sorry if it sounded confusing.

– chmscrbbrfck
Jan 1 at 8:36















@chmscrbbrfck check now made some changes

– Code Maniac
Jan 1 at 8:37





@chmscrbbrfck check now made some changes

– Code Maniac
Jan 1 at 8:37













We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".

– chmscrbbrfck
Jan 1 at 8:42





We're closer now. The logic of the regex is correct but how come it only captures until "Corone"? It fails to capture the "l".

– chmscrbbrfck
Jan 1 at 8:42




1




1





Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha

– chmscrbbrfck
Jan 1 at 8:54





Yeah I already did. It says it is recorded, just does not show because of my reputation. Thanks again Code Maniac. You saved a life today haha

– chmscrbbrfck
Jan 1 at 8:54




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53992881%2fremoving-tricky-phrase-pattern-through-regex%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas