Matching a simple string with regex not working?












-1















I have a large txt-file and want to extract all strings with these patterns:



/m/meet_the_crr
/m/commune
/m/hann_2


Here is what I tried:



import re

with open("testfile.txt", "r") as text_file:
contents = text_file.read().replace("n", "")

print(re.match(r'^/m/[a-zA-Z0-9_-]+$', contents))


The result I get is a simple "None". What am I doing wrong here?










share|improve this question


















  • 1





    Remove .replace("n", "") and use re.findall(r'^/m/[w-]+$', contents, re.M)

    – Wiktor Stribiżew
    Dec 31 '18 at 13:53






  • 1





    Try putting the print statement within the with statement block.

    – Infected Drake
    Dec 31 '18 at 13:55













  • @PatrickArtner I match all 3. So it seems not to be the regex.

    – TAN-C-F-OK
    Dec 31 '18 at 14:01











  • @TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing the n .. your text is /m/meet_the_crr/m/commune/m/hann_2 - no newlines in it ..still matching all ?

    – Patrick Artner
    Dec 31 '18 at 14:04













  • sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag

    – Patrick Artner
    Dec 31 '18 at 14:08


















-1















I have a large txt-file and want to extract all strings with these patterns:



/m/meet_the_crr
/m/commune
/m/hann_2


Here is what I tried:



import re

with open("testfile.txt", "r") as text_file:
contents = text_file.read().replace("n", "")

print(re.match(r'^/m/[a-zA-Z0-9_-]+$', contents))


The result I get is a simple "None". What am I doing wrong here?










share|improve this question


















  • 1





    Remove .replace("n", "") and use re.findall(r'^/m/[w-]+$', contents, re.M)

    – Wiktor Stribiżew
    Dec 31 '18 at 13:53






  • 1





    Try putting the print statement within the with statement block.

    – Infected Drake
    Dec 31 '18 at 13:55













  • @PatrickArtner I match all 3. So it seems not to be the regex.

    – TAN-C-F-OK
    Dec 31 '18 at 14:01











  • @TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing the n .. your text is /m/meet_the_crr/m/commune/m/hann_2 - no newlines in it ..still matching all ?

    – Patrick Artner
    Dec 31 '18 at 14:04













  • sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag

    – Patrick Artner
    Dec 31 '18 at 14:08
















-1












-1








-1








I have a large txt-file and want to extract all strings with these patterns:



/m/meet_the_crr
/m/commune
/m/hann_2


Here is what I tried:



import re

with open("testfile.txt", "r") as text_file:
contents = text_file.read().replace("n", "")

print(re.match(r'^/m/[a-zA-Z0-9_-]+$', contents))


The result I get is a simple "None". What am I doing wrong here?










share|improve this question














I have a large txt-file and want to extract all strings with these patterns:



/m/meet_the_crr
/m/commune
/m/hann_2


Here is what I tried:



import re

with open("testfile.txt", "r") as text_file:
contents = text_file.read().replace("n", "")

print(re.match(r'^/m/[a-zA-Z0-9_-]+$', contents))


The result I get is a simple "None". What am I doing wrong here?







python regex match






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 31 '18 at 13:49









TAN-C-F-OKTAN-C-F-OK

878




878








  • 1





    Remove .replace("n", "") and use re.findall(r'^/m/[w-]+$', contents, re.M)

    – Wiktor Stribiżew
    Dec 31 '18 at 13:53






  • 1





    Try putting the print statement within the with statement block.

    – Infected Drake
    Dec 31 '18 at 13:55













  • @PatrickArtner I match all 3. So it seems not to be the regex.

    – TAN-C-F-OK
    Dec 31 '18 at 14:01











  • @TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing the n .. your text is /m/meet_the_crr/m/commune/m/hann_2 - no newlines in it ..still matching all ?

    – Patrick Artner
    Dec 31 '18 at 14:04













  • sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag

    – Patrick Artner
    Dec 31 '18 at 14:08
















  • 1





    Remove .replace("n", "") and use re.findall(r'^/m/[w-]+$', contents, re.M)

    – Wiktor Stribiżew
    Dec 31 '18 at 13:53






  • 1





    Try putting the print statement within the with statement block.

    – Infected Drake
    Dec 31 '18 at 13:55













  • @PatrickArtner I match all 3. So it seems not to be the regex.

    – TAN-C-F-OK
    Dec 31 '18 at 14:01











  • @TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing the n .. your text is /m/meet_the_crr/m/commune/m/hann_2 - no newlines in it ..still matching all ?

    – Patrick Artner
    Dec 31 '18 at 14:04













  • sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag

    – Patrick Artner
    Dec 31 '18 at 14:08










1




1





Remove .replace("n", "") and use re.findall(r'^/m/[w-]+$', contents, re.M)

– Wiktor Stribiżew
Dec 31 '18 at 13:53





Remove .replace("n", "") and use re.findall(r'^/m/[w-]+$', contents, re.M)

– Wiktor Stribiżew
Dec 31 '18 at 13:53




1




1





Try putting the print statement within the with statement block.

– Infected Drake
Dec 31 '18 at 13:55







Try putting the print statement within the with statement block.

– Infected Drake
Dec 31 '18 at 13:55















@PatrickArtner I match all 3. So it seems not to be the regex.

– TAN-C-F-OK
Dec 31 '18 at 14:01





@PatrickArtner I match all 3. So it seems not to be the regex.

– TAN-C-F-OK
Dec 31 '18 at 14:01













@TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing the n .. your text is /m/meet_the_crr/m/commune/m/hann_2 - no newlines in it ..still matching all ?

– Patrick Artner
Dec 31 '18 at 14:04







@TAN-C-F-OK .. now use the real text you are giving the regex to work on ..after removing the n .. your text is /m/meet_the_crr/m/commune/m/hann_2 - no newlines in it ..still matching all ?

– Patrick Artner
Dec 31 '18 at 14:04















sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag

– Patrick Artner
Dec 31 '18 at 14:08







sorry for the url mishap: it is regex101.com -your special case is here: regex101.com/r/PyNjiE/1 .. and it uses the Multiline-flag

– Patrick Artner
Dec 31 '18 at 14:08














3 Answers
3






active

oldest

votes


















1














You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:



# write a demo file
with open("t.txt","w") as f:
f.write("""
/m/meet_the_crrn
/m/communen
/m/hann_2nn
# your text looks like this after .read().replace("\n","")n
/m/meet_the_crr/m/commune/m/hann_2""")


Program:



import re

regex = r"^/m/[a-zA-Z0-9_-]+$"

with open("t.txt","r") as f:
contents = f.read()

found_all = re.findall(regex,contents,re.M)

print(found_all)
print("-")
print(open("t.txt").read())


Output:



['/m/meet_the_crr', '/m/commune', '/m/hann_2'] 


Filecontent:



/m/meet_the_crr

/m/commune

/m/hann_2


# your text looks like this after .read().replace("n","")

/m/meet_the_crr/m/commune/m/hann_2


This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'






share|improve this answer































    1














    There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:



    result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
    if result:
    print(result.groups()) # this line is reached, as there is a match


    Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.



    result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
    if result:
    print(result.groups(1)[0])

    /m/meet_the_crr





    share|improve this answer



















    • 1





      Is something wrong with the txt-file? I only get "" now.

      – TAN-C-F-OK
      Dec 31 '18 at 14:00











    • @TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and apply match to it.

      – Tim Biegeleisen
      Dec 31 '18 at 14:09











    • It works, as long as I'm not putting the text-file in there.

      – TAN-C-F-OK
      Dec 31 '18 at 14:12



















    1














    You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.



    There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.



    Example:



    import re
    with open("testfile.txt", "r") as text_file:
    contents = text_file.read()
    print(re.findall(r'^/m/[w-]+$', contents, re.M))


    Or



    import re
    with open("testfile.txt", "r") as text_file:
    for line in text_file:
    if re.match(r'/m/[w-]+s*$', line):
    print(line.rstrip())


    Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.



    Also, / is not a special char in Python regex patterns, there is no need escaping it.






    share|improve this answer
























    • A quick example.

      – Wiktor Stribiżew
      Dec 31 '18 at 14:18











    • I can't believe you actually gave an answer which involves something other than regex. New year's resolution?

      – Tim Biegeleisen
      Dec 31 '18 at 14:20











    • @TimBiegeleisen Python has been my primary programming language for almost a year.

      – Wiktor Stribiżew
      Dec 31 '18 at 14:25











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988217%2fmatching-a-simple-string-with-regex-not-working%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:



    # write a demo file
    with open("t.txt","w") as f:
    f.write("""
    /m/meet_the_crrn
    /m/communen
    /m/hann_2nn
    # your text looks like this after .read().replace("\n","")n
    /m/meet_the_crr/m/commune/m/hann_2""")


    Program:



    import re

    regex = r"^/m/[a-zA-Z0-9_-]+$"

    with open("t.txt","r") as f:
    contents = f.read()

    found_all = re.findall(regex,contents,re.M)

    print(found_all)
    print("-")
    print(open("t.txt").read())


    Output:



    ['/m/meet_the_crr', '/m/commune', '/m/hann_2'] 


    Filecontent:



    /m/meet_the_crr

    /m/commune

    /m/hann_2


    # your text looks like this after .read().replace("n","")

    /m/meet_the_crr/m/commune/m/hann_2


    This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'






    share|improve this answer




























      1














      You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:



      # write a demo file
      with open("t.txt","w") as f:
      f.write("""
      /m/meet_the_crrn
      /m/communen
      /m/hann_2nn
      # your text looks like this after .read().replace("\n","")n
      /m/meet_the_crr/m/commune/m/hann_2""")


      Program:



      import re

      regex = r"^/m/[a-zA-Z0-9_-]+$"

      with open("t.txt","r") as f:
      contents = f.read()

      found_all = re.findall(regex,contents,re.M)

      print(found_all)
      print("-")
      print(open("t.txt").read())


      Output:



      ['/m/meet_the_crr', '/m/commune', '/m/hann_2'] 


      Filecontent:



      /m/meet_the_crr

      /m/commune

      /m/hann_2


      # your text looks like this after .read().replace("n","")

      /m/meet_the_crr/m/commune/m/hann_2


      This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'






      share|improve this answer


























        1












        1








        1







        You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:



        # write a demo file
        with open("t.txt","w") as f:
        f.write("""
        /m/meet_the_crrn
        /m/communen
        /m/hann_2nn
        # your text looks like this after .read().replace("\n","")n
        /m/meet_the_crr/m/commune/m/hann_2""")


        Program:



        import re

        regex = r"^/m/[a-zA-Z0-9_-]+$"

        with open("t.txt","r") as f:
        contents = f.read()

        found_all = re.findall(regex,contents,re.M)

        print(found_all)
        print("-")
        print(open("t.txt").read())


        Output:



        ['/m/meet_the_crr', '/m/commune', '/m/hann_2'] 


        Filecontent:



        /m/meet_the_crr

        /m/commune

        /m/hann_2


        # your text looks like this after .read().replace("n","")

        /m/meet_the_crr/m/commune/m/hann_2


        This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'






        share|improve this answer













        You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:



        # write a demo file
        with open("t.txt","w") as f:
        f.write("""
        /m/meet_the_crrn
        /m/communen
        /m/hann_2nn
        # your text looks like this after .read().replace("\n","")n
        /m/meet_the_crr/m/commune/m/hann_2""")


        Program:



        import re

        regex = r"^/m/[a-zA-Z0-9_-]+$"

        with open("t.txt","r") as f:
        contents = f.read()

        found_all = re.findall(regex,contents,re.M)

        print(found_all)
        print("-")
        print(open("t.txt").read())


        Output:



        ['/m/meet_the_crr', '/m/commune', '/m/hann_2'] 


        Filecontent:



        /m/meet_the_crr

        /m/commune

        /m/hann_2


        # your text looks like this after .read().replace("n","")

        /m/meet_the_crr/m/commune/m/hann_2


        This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[w-]+$'







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 31 '18 at 14:20









        Patrick ArtnerPatrick Artner

        23.8k62443




        23.8k62443

























            1














            There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:



            result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
            if result:
            print(result.groups()) # this line is reached, as there is a match


            Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.



            result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
            if result:
            print(result.groups(1)[0])

            /m/meet_the_crr





            share|improve this answer



















            • 1





              Is something wrong with the txt-file? I only get "" now.

              – TAN-C-F-OK
              Dec 31 '18 at 14:00











            • @TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and apply match to it.

              – Tim Biegeleisen
              Dec 31 '18 at 14:09











            • It works, as long as I'm not putting the text-file in there.

              – TAN-C-F-OK
              Dec 31 '18 at 14:12
















            1














            There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:



            result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
            if result:
            print(result.groups()) # this line is reached, as there is a match


            Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.



            result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
            if result:
            print(result.groups(1)[0])

            /m/meet_the_crr





            share|improve this answer



















            • 1





              Is something wrong with the txt-file? I only get "" now.

              – TAN-C-F-OK
              Dec 31 '18 at 14:00











            • @TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and apply match to it.

              – Tim Biegeleisen
              Dec 31 '18 at 14:09











            • It works, as long as I'm not putting the text-file in there.

              – TAN-C-F-OK
              Dec 31 '18 at 14:12














            1












            1








            1







            There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:



            result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
            if result:
            print(result.groups()) # this line is reached, as there is a match


            Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.



            result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
            if result:
            print(result.groups(1)[0])

            /m/meet_the_crr





            share|improve this answer













            There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:



            result = re.match(r'^/m/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
            if result:
            print(result.groups()) # this line is reached, as there is a match


            Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.



            result = re.match(r'(^/m/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
            if result:
            print(result.groups(1)[0])

            /m/meet_the_crr






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 31 '18 at 13:54









            Tim BiegeleisenTim Biegeleisen

            225k1391143




            225k1391143








            • 1





              Is something wrong with the txt-file? I only get "" now.

              – TAN-C-F-OK
              Dec 31 '18 at 14:00











            • @TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and apply match to it.

              – Tim Biegeleisen
              Dec 31 '18 at 14:09











            • It works, as long as I'm not putting the text-file in there.

              – TAN-C-F-OK
              Dec 31 '18 at 14:12














            • 1





              Is something wrong with the txt-file? I only get "" now.

              – TAN-C-F-OK
              Dec 31 '18 at 14:00











            • @TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and apply match to it.

              – Tim Biegeleisen
              Dec 31 '18 at 14:09











            • It works, as long as I'm not putting the text-file in there.

              – TAN-C-F-OK
              Dec 31 '18 at 14:12








            1




            1





            Is something wrong with the txt-file? I only get "" now.

            – TAN-C-F-OK
            Dec 31 '18 at 14:00





            Is something wrong with the txt-file? I only get "" now.

            – TAN-C-F-OK
            Dec 31 '18 at 14:00













            @TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and apply match to it.

            – Tim Biegeleisen
            Dec 31 '18 at 14:09





            @TAN-C-F-OK I gave my answer under the assumption that you already had the ability to read your text file line by line, and apply match to it.

            – Tim Biegeleisen
            Dec 31 '18 at 14:09













            It works, as long as I'm not putting the text-file in there.

            – TAN-C-F-OK
            Dec 31 '18 at 14:12





            It works, as long as I'm not putting the text-file in there.

            – TAN-C-F-OK
            Dec 31 '18 at 14:12











            1














            You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.



            There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.



            Example:



            import re
            with open("testfile.txt", "r") as text_file:
            contents = text_file.read()
            print(re.findall(r'^/m/[w-]+$', contents, re.M))


            Or



            import re
            with open("testfile.txt", "r") as text_file:
            for line in text_file:
            if re.match(r'/m/[w-]+s*$', line):
            print(line.rstrip())


            Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.



            Also, / is not a special char in Python regex patterns, there is no need escaping it.






            share|improve this answer
























            • A quick example.

              – Wiktor Stribiżew
              Dec 31 '18 at 14:18











            • I can't believe you actually gave an answer which involves something other than regex. New year's resolution?

              – Tim Biegeleisen
              Dec 31 '18 at 14:20











            • @TimBiegeleisen Python has been my primary programming language for almost a year.

              – Wiktor Stribiżew
              Dec 31 '18 at 14:25
















            1














            You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.



            There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.



            Example:



            import re
            with open("testfile.txt", "r") as text_file:
            contents = text_file.read()
            print(re.findall(r'^/m/[w-]+$', contents, re.M))


            Or



            import re
            with open("testfile.txt", "r") as text_file:
            for line in text_file:
            if re.match(r'/m/[w-]+s*$', line):
            print(line.rstrip())


            Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.



            Also, / is not a special char in Python regex patterns, there is no need escaping it.






            share|improve this answer
























            • A quick example.

              – Wiktor Stribiżew
              Dec 31 '18 at 14:18











            • I can't believe you actually gave an answer which involves something other than regex. New year's resolution?

              – Tim Biegeleisen
              Dec 31 '18 at 14:20











            • @TimBiegeleisen Python has been my primary programming language for almost a year.

              – Wiktor Stribiżew
              Dec 31 '18 at 14:25














            1












            1








            1







            You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.



            There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.



            Example:



            import re
            with open("testfile.txt", "r") as text_file:
            contents = text_file.read()
            print(re.findall(r'^/m/[w-]+$', contents, re.M))


            Or



            import re
            with open("testfile.txt", "r") as text_file:
            for line in text_file:
            if re.match(r'/m/[w-]+s*$', line):
            print(line.rstrip())


            Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.



            Also, / is not a special char in Python regex patterns, there is no need escaping it.






            share|improve this answer













            You are reading a whole file into a variable (into memory) using .read(). With .replace("n", ""), you re,ove all newlines in the string. The re.match(r'^/m/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the /m/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.



            There are at least two ways out. Either remove .replace("n", "") (to prevent newline removal) and use re.findall(r'^/m/[w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.



            Example:



            import re
            with open("testfile.txt", "r") as text_file:
            contents = text_file.read()
            print(re.findall(r'^/m/[w-]+$', contents, re.M))


            Or



            import re
            with open("testfile.txt", "r") as text_file:
            for line in text_file:
            if re.match(r'/m/[w-]+s*$', line):
            print(line.rstrip())


            Note I used w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.



            Also, / is not a special char in Python regex patterns, there is no need escaping it.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 31 '18 at 14:11









            Wiktor StribiżewWiktor Stribiżew

            315k16133214




            315k16133214













            • A quick example.

              – Wiktor Stribiżew
              Dec 31 '18 at 14:18











            • I can't believe you actually gave an answer which involves something other than regex. New year's resolution?

              – Tim Biegeleisen
              Dec 31 '18 at 14:20











            • @TimBiegeleisen Python has been my primary programming language for almost a year.

              – Wiktor Stribiżew
              Dec 31 '18 at 14:25



















            • A quick example.

              – Wiktor Stribiżew
              Dec 31 '18 at 14:18











            • I can't believe you actually gave an answer which involves something other than regex. New year's resolution?

              – Tim Biegeleisen
              Dec 31 '18 at 14:20











            • @TimBiegeleisen Python has been my primary programming language for almost a year.

              – Wiktor Stribiżew
              Dec 31 '18 at 14:25

















            A quick example.

            – Wiktor Stribiżew
            Dec 31 '18 at 14:18





            A quick example.

            – Wiktor Stribiżew
            Dec 31 '18 at 14:18













            I can't believe you actually gave an answer which involves something other than regex. New year's resolution?

            – Tim Biegeleisen
            Dec 31 '18 at 14:20





            I can't believe you actually gave an answer which involves something other than regex. New year's resolution?

            – Tim Biegeleisen
            Dec 31 '18 at 14:20













            @TimBiegeleisen Python has been my primary programming language for almost a year.

            – Wiktor Stribiżew
            Dec 31 '18 at 14:25





            @TimBiegeleisen Python has been my primary programming language for almost a year.

            – Wiktor Stribiżew
            Dec 31 '18 at 14:25


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988217%2fmatching-a-simple-string-with-regex-not-working%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Mossoró

            Error while reading .h5 file using the rhdf5 package in R

            Pushsharp Apns notification error: 'InvalidToken'