Matching a simple string with regex not working?
I have a large txt-file and want to extract all strings with these patterns:
/m/meet_the_crr
/m/commune
/m/hann_2
Here is what I tried:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read().replace("n", "")
print(re.match(r'^/m/[a-zA-Z0-9_-]+$', contents))
The result I get is a simple "None". What am I doing wrong here?
python regex match
|
show 1 more comment
I have a large txt-file and want to extract all strings with these patterns:
/m/meet_the_crr
/m/commune
/m/hann_2
Here is what I tried:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read().replace("n", "")
print(re.match(r'^/m/[a-zA-Z0-9_-]+$', contents))
The result I get is a simple "None". What am I doing wrong here?
python regex match
1
Remove.replace("n", "")and usere.findall(r'^/m/[w-]+$', contents, re.M)
– Wiktor Stribiżew
Dec 31 '18 at 13:53
1
Try putting the print statement within thewithstatement block.
– Infected Drake
Dec 31 '18 at 13:55
@PatrickArtner I match all 3. So it seems not to be the regex.
– TAN-C-F-OK
Dec 31 '18 at 14:01
@TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing then.. your text is/m/meet_the_crr/m/commune/m/hann_2- no newlines in it ..still matching all ?
– Patrick Artner
Dec 31 '18 at 14:04
sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag
– Patrick Artner
Dec 31 '18 at 14:08
|
show 1 more comment
I have a large txt-file and want to extract all strings with these patterns:
/m/meet_the_crr
/m/commune
/m/hann_2
Here is what I tried:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read().replace("n", "")
print(re.match(r'^/m/[a-zA-Z0-9_-]+$', contents))
The result I get is a simple "None". What am I doing wrong here?
python regex match
I have a large txt-file and want to extract all strings with these patterns:
/m/meet_the_crr
/m/commune
/m/hann_2
Here is what I tried:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read().replace("n", "")
print(re.match(r'^/m/[a-zA-Z0-9_-]+$', contents))
The result I get is a simple "None". What am I doing wrong here?
python regex match
python regex match
asked Dec 31 '18 at 13:49
TAN-C-F-OKTAN-C-F-OK
878
878
1
Remove.replace("n", "")and usere.findall(r'^/m/[w-]+$', contents, re.M)
– Wiktor Stribiżew
Dec 31 '18 at 13:53
1
Try putting the print statement within thewithstatement block.
– Infected Drake
Dec 31 '18 at 13:55
@PatrickArtner I match all 3. So it seems not to be the regex.
– TAN-C-F-OK
Dec 31 '18 at 14:01
@TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing then.. your text is/m/meet_the_crr/m/commune/m/hann_2- no newlines in it ..still matching all ?
– Patrick Artner
Dec 31 '18 at 14:04
sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag
– Patrick Artner
Dec 31 '18 at 14:08
|
show 1 more comment
1
Remove.replace("n", "")and usere.findall(r'^/m/[w-]+$', contents, re.M)
– Wiktor Stribiżew
Dec 31 '18 at 13:53
1
Try putting the print statement within thewithstatement block.
– Infected Drake
Dec 31 '18 at 13:55
@PatrickArtner I match all 3. So it seems not to be the regex.
– TAN-C-F-OK
Dec 31 '18 at 14:01
@TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing then.. your text is/m/meet_the_crr/m/commune/m/hann_2- no newlines in it ..still matching all ?
– Patrick Artner
Dec 31 '18 at 14:04
sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag
– Patrick Artner
Dec 31 '18 at 14:08
1
1
Remove
.replace("n", "") and use re.findall(r'^/m/[w-]+$', contents, re.M)– Wiktor Stribiżew
Dec 31 '18 at 13:53
Remove
.replace("n", "") and use re.findall(r'^/m/[w-]+$', contents, re.M)– Wiktor Stribiżew
Dec 31 '18 at 13:53
1
1
Try putting the print statement within the
with statement block.– Infected Drake
Dec 31 '18 at 13:55
Try putting the print statement within the
with statement block.– Infected Drake
Dec 31 '18 at 13:55
@PatrickArtner I match all 3. So it seems not to be the regex.
– TAN-C-F-OK
Dec 31 '18 at 14:01
@PatrickArtner I match all 3. So it seems not to be the regex.
– TAN-C-F-OK
Dec 31 '18 at 14:01
@TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing the
n .. your text is /m/meet_the_crr/m/commune/m/hann_2 - no newlines in it ..still matching all ?– Patrick Artner
Dec 31 '18 at 14:04
@TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing the
n .. your text is /m/meet_the_crr/m/commune/m/hann_2 - no newlines in it ..still matching all ?– Patrick Artner
Dec 31 '18 at 14:04
sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag
– Patrick Artner
Dec 31 '18 at 14:08
sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag
– Patrick Artner
Dec 31 '18 at 14:08
|
show 1 more comment
3 Answers
3
active
oldest
votes
You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:
# write a demo file
with open("t.txt","w") as f:
f.write("""
/m/meet_the_crrn
/m/communen
/m/hann_2nn
# your text looks like this after .read().replace("\n","")n
/m/meet_the_crr/m/commune/m/hann_2""")
Program:
import re
regex = r"^/m/[a-zA-Z0-9_-]+$"
with open("t.txt","r") as f:
contents = f.read()
found_all = re.findall(regex,contents,re.M)
print(found_all)
print("-")
print(open("t.txt").read())
Output:
['/m/meet_the_crr', '/m/commune', '/m/hann_2']
Filecontent:
/m/meet_the_crr
/m/commune
/m/hann_2
# your text looks like this after .read().replace("n","")
/m/meet_the_crr/m/commune/m/hann_2
This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'
add a comment |
There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:
result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
if result:
print(result.groups()) # this line is reached, as there is a match
Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.
result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
if result:
print(result.groups(1)[0])
/m/meet_the_crr
1
Is something wrong with the txt-file? I only get "" now.
– TAN-C-F-OK
Dec 31 '18 at 14:00
@TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and applymatchto it.
– Tim Biegeleisen
Dec 31 '18 at 14:09
It works, as long as I'm not putting the text-file in there.
– TAN-C-F-OK
Dec 31 '18 at 14:12
add a comment |
You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.
There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.
Example:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read()
print(re.findall(r'^/m/[w-]+$', contents, re.M))
Or
import re
with open("testfile.txt", "r") as text_file:
for line in text_file:
if re.match(r'/m/[w-]+s*$', line):
print(line.rstrip())
Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.
Also, / is not a special char in Python regex patterns, there is no need escaping it.
A quick example.
– Wiktor Stribiżew
Dec 31 '18 at 14:18
I can't believe you actually gave an answer which involves something other than regex. New year's resolution?
– Tim Biegeleisen
Dec 31 '18 at 14:20
@TimBiegeleisen Python has been my primary programming language for almost a year.
– Wiktor Stribiżew
Dec 31 '18 at 14:25
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988217%2fmatching-a-simple-string-with-regex-not-working%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:
# write a demo file
with open("t.txt","w") as f:
f.write("""
/m/meet_the_crrn
/m/communen
/m/hann_2nn
# your text looks like this after .read().replace("\n","")n
/m/meet_the_crr/m/commune/m/hann_2""")
Program:
import re
regex = r"^/m/[a-zA-Z0-9_-]+$"
with open("t.txt","r") as f:
contents = f.read()
found_all = re.findall(regex,contents,re.M)
print(found_all)
print("-")
print(open("t.txt").read())
Output:
['/m/meet_the_crr', '/m/commune', '/m/hann_2']
Filecontent:
/m/meet_the_crr
/m/commune
/m/hann_2
# your text looks like this after .read().replace("n","")
/m/meet_the_crr/m/commune/m/hann_2
This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'
add a comment |
You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:
# write a demo file
with open("t.txt","w") as f:
f.write("""
/m/meet_the_crrn
/m/communen
/m/hann_2nn
# your text looks like this after .read().replace("\n","")n
/m/meet_the_crr/m/commune/m/hann_2""")
Program:
import re
regex = r"^/m/[a-zA-Z0-9_-]+$"
with open("t.txt","r") as f:
contents = f.read()
found_all = re.findall(regex,contents,re.M)
print(found_all)
print("-")
print(open("t.txt").read())
Output:
['/m/meet_the_crr', '/m/commune', '/m/hann_2']
Filecontent:
/m/meet_the_crr
/m/commune
/m/hann_2
# your text looks like this after .read().replace("n","")
/m/meet_the_crr/m/commune/m/hann_2
This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'
add a comment |
You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:
# write a demo file
with open("t.txt","w") as f:
f.write("""
/m/meet_the_crrn
/m/communen
/m/hann_2nn
# your text looks like this after .read().replace("\n","")n
/m/meet_the_crr/m/commune/m/hann_2""")
Program:
import re
regex = r"^/m/[a-zA-Z0-9_-]+$"
with open("t.txt","r") as f:
contents = f.read()
found_all = re.findall(regex,contents,re.M)
print(found_all)
print("-")
print(open("t.txt").read())
Output:
['/m/meet_the_crr', '/m/commune', '/m/hann_2']
Filecontent:
/m/meet_the_crr
/m/commune
/m/hann_2
# your text looks like this after .read().replace("n","")
/m/meet_the_crr/m/commune/m/hann_2
This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'
You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:
# write a demo file
with open("t.txt","w") as f:
f.write("""
/m/meet_the_crrn
/m/communen
/m/hann_2nn
# your text looks like this after .read().replace("\n","")n
/m/meet_the_crr/m/commune/m/hann_2""")
Program:
import re
regex = r"^/m/[a-zA-Z0-9_-]+$"
with open("t.txt","r") as f:
contents = f.read()
found_all = re.findall(regex,contents,re.M)
print(found_all)
print("-")
print(open("t.txt").read())
Output:
['/m/meet_the_crr', '/m/commune', '/m/hann_2']
Filecontent:
/m/meet_the_crr
/m/commune
/m/hann_2
# your text looks like this after .read().replace("n","")
/m/meet_the_crr/m/commune/m/hann_2
This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'
answered Dec 31 '18 at 14:20
Patrick ArtnerPatrick Artner
23.8k62443
23.8k62443
add a comment |
add a comment |
There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:
result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
if result:
print(result.groups()) # this line is reached, as there is a match
Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.
result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
if result:
print(result.groups(1)[0])
/m/meet_the_crr
1
Is something wrong with the txt-file? I only get "" now.
– TAN-C-F-OK
Dec 31 '18 at 14:00
@TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and applymatchto it.
– Tim Biegeleisen
Dec 31 '18 at 14:09
It works, as long as I'm not putting the text-file in there.
– TAN-C-F-OK
Dec 31 '18 at 14:12
add a comment |
There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:
result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
if result:
print(result.groups()) # this line is reached, as there is a match
Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.
result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
if result:
print(result.groups(1)[0])
/m/meet_the_crr
1
Is something wrong with the txt-file? I only get "" now.
– TAN-C-F-OK
Dec 31 '18 at 14:00
@TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and applymatchto it.
– Tim Biegeleisen
Dec 31 '18 at 14:09
It works, as long as I'm not putting the text-file in there.
– TAN-C-F-OK
Dec 31 '18 at 14:12
add a comment |
There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:
result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
if result:
print(result.groups()) # this line is reached, as there is a match
Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.
result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
if result:
print(result.groups(1)[0])
/m/meet_the_crr
There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:
result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
if result:
print(result.groups()) # this line is reached, as there is a match
Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.
result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
if result:
print(result.groups(1)[0])
/m/meet_the_crr
answered Dec 31 '18 at 13:54
Tim BiegeleisenTim Biegeleisen
225k1391143
225k1391143
1
Is something wrong with the txt-file? I only get "" now.
– TAN-C-F-OK
Dec 31 '18 at 14:00
@TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and applymatchto it.
– Tim Biegeleisen
Dec 31 '18 at 14:09
It works, as long as I'm not putting the text-file in there.
– TAN-C-F-OK
Dec 31 '18 at 14:12
add a comment |
1
Is something wrong with the txt-file? I only get "" now.
– TAN-C-F-OK
Dec 31 '18 at 14:00
@TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and applymatchto it.
– Tim Biegeleisen
Dec 31 '18 at 14:09
It works, as long as I'm not putting the text-file in there.
– TAN-C-F-OK
Dec 31 '18 at 14:12
1
1
Is something wrong with the txt-file? I only get "" now.
– TAN-C-F-OK
Dec 31 '18 at 14:00
Is something wrong with the txt-file? I only get "" now.
– TAN-C-F-OK
Dec 31 '18 at 14:00
@TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and apply
match to it.– Tim Biegeleisen
Dec 31 '18 at 14:09
@TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and apply
match to it.– Tim Biegeleisen
Dec 31 '18 at 14:09
It works, as long as I'm not putting the text-file in there.
– TAN-C-F-OK
Dec 31 '18 at 14:12
It works, as long as I'm not putting the text-file in there.
– TAN-C-F-OK
Dec 31 '18 at 14:12
add a comment |
You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.
There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.
Example:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read()
print(re.findall(r'^/m/[w-]+$', contents, re.M))
Or
import re
with open("testfile.txt", "r") as text_file:
for line in text_file:
if re.match(r'/m/[w-]+s*$', line):
print(line.rstrip())
Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.
Also, / is not a special char in Python regex patterns, there is no need escaping it.
A quick example.
– Wiktor Stribiżew
Dec 31 '18 at 14:18
I can't believe you actually gave an answer which involves something other than regex. New year's resolution?
– Tim Biegeleisen
Dec 31 '18 at 14:20
@TimBiegeleisen Python has been my primary programming language for almost a year.
– Wiktor Stribiżew
Dec 31 '18 at 14:25
add a comment |
You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.
There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.
Example:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read()
print(re.findall(r'^/m/[w-]+$', contents, re.M))
Or
import re
with open("testfile.txt", "r") as text_file:
for line in text_file:
if re.match(r'/m/[w-]+s*$', line):
print(line.rstrip())
Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.
Also, / is not a special char in Python regex patterns, there is no need escaping it.
A quick example.
– Wiktor Stribiżew
Dec 31 '18 at 14:18
I can't believe you actually gave an answer which involves something other than regex. New year's resolution?
– Tim Biegeleisen
Dec 31 '18 at 14:20
@TimBiegeleisen Python has been my primary programming language for almost a year.
– Wiktor Stribiżew
Dec 31 '18 at 14:25
add a comment |
You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.
There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.
Example:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read()
print(re.findall(r'^/m/[w-]+$', contents, re.M))
Or
import re
with open("testfile.txt", "r") as text_file:
for line in text_file:
if re.match(r'/m/[w-]+s*$', line):
print(line.rstrip())
Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.
Also, / is not a special char in Python regex patterns, there is no need escaping it.
You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.
There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.
Example:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read()
print(re.findall(r'^/m/[w-]+$', contents, re.M))
Or
import re
with open("testfile.txt", "r") as text_file:
for line in text_file:
if re.match(r'/m/[w-]+s*$', line):
print(line.rstrip())
Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.
Also, / is not a special char in Python regex patterns, there is no need escaping it.
answered Dec 31 '18 at 14:11
Wiktor StribiżewWiktor Stribiżew
315k16133214
315k16133214
A quick example.
– Wiktor Stribiżew
Dec 31 '18 at 14:18
I can't believe you actually gave an answer which involves something other than regex. New year's resolution?
– Tim Biegeleisen
Dec 31 '18 at 14:20
@TimBiegeleisen Python has been my primary programming language for almost a year.
– Wiktor Stribiżew
Dec 31 '18 at 14:25
add a comment |
A quick example.
– Wiktor Stribiżew
Dec 31 '18 at 14:18
I can't believe you actually gave an answer which involves something other than regex. New year's resolution?
– Tim Biegeleisen
Dec 31 '18 at 14:20
@TimBiegeleisen Python has been my primary programming language for almost a year.
– Wiktor Stribiżew
Dec 31 '18 at 14:25
A quick example.
– Wiktor Stribiżew
Dec 31 '18 at 14:18
A quick example.
– Wiktor Stribiżew
Dec 31 '18 at 14:18
I can't believe you actually gave an answer which involves something other than regex. New year's resolution?
– Tim Biegeleisen
Dec 31 '18 at 14:20
I can't believe you actually gave an answer which involves something other than regex. New year's resolution?
– Tim Biegeleisen
Dec 31 '18 at 14:20
@TimBiegeleisen Python has been my primary programming language for almost a year.
– Wiktor Stribiżew
Dec 31 '18 at 14:25
@TimBiegeleisen Python has been my primary programming language for almost a year.
– Wiktor Stribiżew
Dec 31 '18 at 14:25
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988217%2fmatching-a-simple-string-with-regex-not-working%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Remove
.replace("n", "")and usere.findall(r'^/m/[w-]+$', contents, re.M)– Wiktor Stribiżew
Dec 31 '18 at 13:53
1
Try putting the print statement within the
withstatement block.– Infected Drake
Dec 31 '18 at 13:55
@PatrickArtner I match all 3. So it seems not to be the regex.
– TAN-C-F-OK
Dec 31 '18 at 14:01
@TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing the
n.. your text is/m/meet_the_crr/m/commune/m/hann_2- no newlines in it ..still matching all ?– Patrick Artner
Dec 31 '18 at 14:04
sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag
– Patrick Artner
Dec 31 '18 at 14:08