Extract substrings separately from a string using python regex












3















I am trying to write a regular expression which returns a part of substring which is after a string. For example: I want to get part of substring along with spaces which resides after "15/08/2017".



a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342

LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS

ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW

MUNICIPALITY: CITY OF EDMONTON

REFERENCE NUMBER: 172 023 641 +71

----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---

172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''


Is there a way to get 'AFFIDAVIT OF' and 'CASH & MTGE' as separate strings?



Here is the expression I have pieced together so far:



doc = (a.split('15/08/2017', 1)[1]).strip()
'AFFIDAVIT OF CASH & MTGE'









share|improve this question

























  • I have edited with the actual input string.

    – User123
    Dec 21 '18 at 6:11











  • Okay anyway to do this using regex?

    – User123
    Dec 31 '18 at 4:15











  • Why do you want to do this with regex? Are you willing to accept any other solution?

    – Mad Physicist
    Dec 31 '18 at 4:29











  • Yes if there is a better way other than regex

    – User123
    Dec 31 '18 at 4:30
















3















I am trying to write a regular expression which returns a part of substring which is after a string. For example: I want to get part of substring along with spaces which resides after "15/08/2017".



a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342

LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS

ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW

MUNICIPALITY: CITY OF EDMONTON

REFERENCE NUMBER: 172 023 641 +71

----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---

172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''


Is there a way to get 'AFFIDAVIT OF' and 'CASH & MTGE' as separate strings?



Here is the expression I have pieced together so far:



doc = (a.split('15/08/2017', 1)[1]).strip()
'AFFIDAVIT OF CASH & MTGE'









share|improve this question

























  • I have edited with the actual input string.

    – User123
    Dec 21 '18 at 6:11











  • Okay anyway to do this using regex?

    – User123
    Dec 31 '18 at 4:15











  • Why do you want to do this with regex? Are you willing to accept any other solution?

    – Mad Physicist
    Dec 31 '18 at 4:29











  • Yes if there is a better way other than regex

    – User123
    Dec 31 '18 at 4:30














3












3








3


2






I am trying to write a regular expression which returns a part of substring which is after a string. For example: I want to get part of substring along with spaces which resides after "15/08/2017".



a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342

LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS

ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW

MUNICIPALITY: CITY OF EDMONTON

REFERENCE NUMBER: 172 023 641 +71

----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---

172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''


Is there a way to get 'AFFIDAVIT OF' and 'CASH & MTGE' as separate strings?



Here is the expression I have pieced together so far:



doc = (a.split('15/08/2017', 1)[1]).strip()
'AFFIDAVIT OF CASH & MTGE'









share|improve this question
















I am trying to write a regular expression which returns a part of substring which is after a string. For example: I want to get part of substring along with spaces which resides after "15/08/2017".



a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342

LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS

ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW

MUNICIPALITY: CITY OF EDMONTON

REFERENCE NUMBER: 172 023 641 +71

----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---

172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''


Is there a way to get 'AFFIDAVIT OF' and 'CASH & MTGE' as separate strings?



Here is the expression I have pieced together so far:



doc = (a.split('15/08/2017', 1)[1]).strip()
'AFFIDAVIT OF CASH & MTGE'






python regex python-3.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 26 '18 at 4:16









CodeIt

67311020




67311020










asked Dec 26 '18 at 3:54









User123User123

2001416




2001416













  • I have edited with the actual input string.

    – User123
    Dec 21 '18 at 6:11











  • Okay anyway to do this using regex?

    – User123
    Dec 31 '18 at 4:15











  • Why do you want to do this with regex? Are you willing to accept any other solution?

    – Mad Physicist
    Dec 31 '18 at 4:29











  • Yes if there is a better way other than regex

    – User123
    Dec 31 '18 at 4:30



















  • I have edited with the actual input string.

    – User123
    Dec 21 '18 at 6:11











  • Okay anyway to do this using regex?

    – User123
    Dec 31 '18 at 4:15











  • Why do you want to do this with regex? Are you willing to accept any other solution?

    – Mad Physicist
    Dec 31 '18 at 4:29











  • Yes if there is a better way other than regex

    – User123
    Dec 31 '18 at 4:30

















I have edited with the actual input string.

– User123
Dec 21 '18 at 6:11





I have edited with the actual input string.

– User123
Dec 21 '18 at 6:11













Okay anyway to do this using regex?

– User123
Dec 31 '18 at 4:15





Okay anyway to do this using regex?

– User123
Dec 31 '18 at 4:15













Why do you want to do this with regex? Are you willing to accept any other solution?

– Mad Physicist
Dec 31 '18 at 4:29





Why do you want to do this with regex? Are you willing to accept any other solution?

– Mad Physicist
Dec 31 '18 at 4:29













Yes if there is a better way other than regex

– User123
Dec 31 '18 at 4:30





Yes if there is a better way other than regex

– User123
Dec 31 '18 at 4:30












11 Answers
11






active

oldest

votes


















3














Not a regex based solution. But does the trick.



a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342

LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS

ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW

MUNICIPALITY: CITY OF EDMONTON

REFERENCE NUMBER: 172 023 641 +71

----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---

172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

doc = (a.split('15/08/2017', 1)[1]).strip()
# used split with two white spaces instead of one to get the desired result
print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE


Hope it helps.






share|improve this answer
























  • See it in action here.

    – CodeIt
    Dec 26 '18 at 4:03



















3














re based code snippet



import re
foo = '''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342

LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS

ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW

MUNICIPALITY: CITY OF EDMONTON

REFERENCE NUMBER: 172 023 641 +71

----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---

172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
result = re.findall(pattern, foo, re.MULTILINE)
print "1st match: ", result[0][0]
print "2nd match: ", result[0][1]


Output



1st match:  AFFIDAVIT OF
2nd match: CASH & MTGE





share|improve this answer































    3














    We can try using re.findall with the following pattern:



    PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)


    Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF until, but not including, CONDOMINIUM PLAN.



    input = "182 246 612    01/10/2018  PHASED OF                           CASH & MTGEn        CONDOMINIUM PLAN"
    result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
    output = result[0][0].strip()
    print(output)

    CASH & MTGE


    Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.






    share|improve this answer


























    • The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.

      – User123
      Dec 31 '18 at 4:36











    • My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.

      – Tim Biegeleisen
      Dec 31 '18 at 4:40











    • I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?

      – User123
      Dec 31 '18 at 4:46













    • No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.

      – Tim Biegeleisen
      Dec 31 '18 at 4:50











    • Okay fine what will be the modification that needs to be done if there is no multinline word after date?

      – User123
      Dec 31 '18 at 4:52



















    2














    Why regular expressions?



    It looks like you know the exact delimiting string, just str.split() by it and get the first part:



    In [1]: a='172 211 342    15/08/2017  TRANSFER OF LAND   $610,000        CASH & MTGE'

    In [2]: a.split("15/08/2017", 1)[0]
    Out[2]: '172 211 342 '





    share|improve this answer
























    • It wont work for the input string which i have edited now

      – User123
      Dec 21 '18 at 6:16











    • @Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.

      – alecxe
      Dec 21 '18 at 6:17





















    1














    I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,} as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.



    import re
    a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
    parts = re.compile("s{2,}").split(a)
    print(parts)

    for i in range(1, len(parts)):
    if (parts[i] == "15/08/2017"):
    print(parts[i-1])

    ['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
    172 211 342





    share|improve this answer































      1














      positive lookbehind assertion**



       m=re.search('(?<=15/08/2017).*', a)
      m.group(0)





      share|improve this answer































        0














        You have to return the right group:



        re.match("(.*?)15/08/2017",a).group(1)





        share|improve this answer































          0














          You nede to use group(1)



          import re
          re.match("(.*?)15/08/2017",a).group(1)


          Output



          '172 211 342    '





          share|improve this answer































            0














            Building on your expression, this is what I believe you need:



            import re

            a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
            re.match("(.*?)(w+/)",a).group(1)


            Output:



            '172 211 342    '





            share|improve this answer































              0














              You can do this by using group(1)



              re.match("(.*?)15/08/2017",a).group(1)


              UPDATE



              For updated string you can use .search instead of .match



              re.search("(.*?)15/08/2017",a).group(1)





              share|improve this answer


























              • This will give incorrect results if there are more than one term before 15/08/2017.

                – Tim Biegeleisen
                Dec 21 '18 at 5:57











              • I have edited my input string. It didn't work for the string which is edited now

                – User123
                Dec 21 '18 at 6:10











              • This will fail completely if the desired term is anything other than the first term.

                – Tim Biegeleisen
                Dec 21 '18 at 6:25



















              0














              Your problem is that your string is formatted the way it is.
              The line you are looking for is



              182 246 612 01/10/2018 PHASED OF CASH & MTGE



              And then you are looking for what ever comes after 'PHASED OF' and some spaces.



              You want to search for




              (?<=PHASED OF)s*(?P.*?)n




              in your string. This will return a match object containing the value you are looking for in the group value.



              m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
              your_desired_text = m.group('your_text')


              Also: There are many good online regex testers to fiddle around with your regexes.
              And only after finishing up the regex just copy and paste it into python.



              I use this one: https://regex101.com/






              share|improve this answer
























              • I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'

                – User123
                Dec 31 '18 at 4:39











              • "I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?

                – Kanjiu
                Dec 31 '18 at 4:43











              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53927256%2fextract-substrings-separately-from-a-string-using-python-regex%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              11 Answers
              11






              active

              oldest

              votes








              11 Answers
              11






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              Not a regex based solution. But does the trick.



              a='''S
              LINC SHORT LEGAL TITLE NUMBER
              0037 471 661 1720278;16;21 172 211 342

              LEGAL DESCRIPTION
              PLAN 1720278
              BLOCK 16
              LOT 21
              EXCEPTING THEREOUT ALL MINES AND MINERALS

              ESTATE: FEE SIMPLE
              ATS REFERENCE: 4;24;54;2;SW

              MUNICIPALITY: CITY OF EDMONTON

              REFERENCE NUMBER: 172 023 641 +71

              ----------------------------------------------------------------------------
              ----
              REGISTERED OWNER(S)
              REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
              ---------------------------------------------------------------------------
              --
              ---

              172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

              doc = (a.split('15/08/2017', 1)[1]).strip()
              # used split with two white spaces instead of one to get the desired result
              print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
              print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE


              Hope it helps.






              share|improve this answer
























              • See it in action here.

                – CodeIt
                Dec 26 '18 at 4:03
















              3














              Not a regex based solution. But does the trick.



              a='''S
              LINC SHORT LEGAL TITLE NUMBER
              0037 471 661 1720278;16;21 172 211 342

              LEGAL DESCRIPTION
              PLAN 1720278
              BLOCK 16
              LOT 21
              EXCEPTING THEREOUT ALL MINES AND MINERALS

              ESTATE: FEE SIMPLE
              ATS REFERENCE: 4;24;54;2;SW

              MUNICIPALITY: CITY OF EDMONTON

              REFERENCE NUMBER: 172 023 641 +71

              ----------------------------------------------------------------------------
              ----
              REGISTERED OWNER(S)
              REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
              ---------------------------------------------------------------------------
              --
              ---

              172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

              doc = (a.split('15/08/2017', 1)[1]).strip()
              # used split with two white spaces instead of one to get the desired result
              print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
              print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE


              Hope it helps.






              share|improve this answer
























              • See it in action here.

                – CodeIt
                Dec 26 '18 at 4:03














              3












              3








              3







              Not a regex based solution. But does the trick.



              a='''S
              LINC SHORT LEGAL TITLE NUMBER
              0037 471 661 1720278;16;21 172 211 342

              LEGAL DESCRIPTION
              PLAN 1720278
              BLOCK 16
              LOT 21
              EXCEPTING THEREOUT ALL MINES AND MINERALS

              ESTATE: FEE SIMPLE
              ATS REFERENCE: 4;24;54;2;SW

              MUNICIPALITY: CITY OF EDMONTON

              REFERENCE NUMBER: 172 023 641 +71

              ----------------------------------------------------------------------------
              ----
              REGISTERED OWNER(S)
              REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
              ---------------------------------------------------------------------------
              --
              ---

              172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

              doc = (a.split('15/08/2017', 1)[1]).strip()
              # used split with two white spaces instead of one to get the desired result
              print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
              print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE


              Hope it helps.






              share|improve this answer













              Not a regex based solution. But does the trick.



              a='''S
              LINC SHORT LEGAL TITLE NUMBER
              0037 471 661 1720278;16;21 172 211 342

              LEGAL DESCRIPTION
              PLAN 1720278
              BLOCK 16
              LOT 21
              EXCEPTING THEREOUT ALL MINES AND MINERALS

              ESTATE: FEE SIMPLE
              ATS REFERENCE: 4;24;54;2;SW

              MUNICIPALITY: CITY OF EDMONTON

              REFERENCE NUMBER: 172 023 641 +71

              ----------------------------------------------------------------------------
              ----
              REGISTERED OWNER(S)
              REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
              ---------------------------------------------------------------------------
              --
              ---

              172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

              doc = (a.split('15/08/2017', 1)[1]).strip()
              # used split with two white spaces instead of one to get the desired result
              print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
              print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE


              Hope it helps.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Dec 26 '18 at 4:00









              CodeItCodeIt

              67311020




              67311020













              • See it in action here.

                – CodeIt
                Dec 26 '18 at 4:03



















              • See it in action here.

                – CodeIt
                Dec 26 '18 at 4:03

















              See it in action here.

              – CodeIt
              Dec 26 '18 at 4:03





              See it in action here.

              – CodeIt
              Dec 26 '18 at 4:03













              3














              re based code snippet



              import re
              foo = '''S
              LINC SHORT LEGAL TITLE NUMBER
              0037 471 661 1720278;16;21 172 211 342

              LEGAL DESCRIPTION
              PLAN 1720278
              BLOCK 16
              LOT 21
              EXCEPTING THEREOUT ALL MINES AND MINERALS

              ESTATE: FEE SIMPLE
              ATS REFERENCE: 4;24;54;2;SW

              MUNICIPALITY: CITY OF EDMONTON

              REFERENCE NUMBER: 172 023 641 +71

              ----------------------------------------------------------------------------
              ----
              REGISTERED OWNER(S)
              REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
              ---------------------------------------------------------------------------
              --
              ---

              172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

              pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
              result = re.findall(pattern, foo, re.MULTILINE)
              print "1st match: ", result[0][0]
              print "2nd match: ", result[0][1]


              Output



              1st match:  AFFIDAVIT OF
              2nd match: CASH & MTGE





              share|improve this answer




























                3














                re based code snippet



                import re
                foo = '''S
                LINC SHORT LEGAL TITLE NUMBER
                0037 471 661 1720278;16;21 172 211 342

                LEGAL DESCRIPTION
                PLAN 1720278
                BLOCK 16
                LOT 21
                EXCEPTING THEREOUT ALL MINES AND MINERALS

                ESTATE: FEE SIMPLE
                ATS REFERENCE: 4;24;54;2;SW

                MUNICIPALITY: CITY OF EDMONTON

                REFERENCE NUMBER: 172 023 641 +71

                ----------------------------------------------------------------------------
                ----
                REGISTERED OWNER(S)
                REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
                ---------------------------------------------------------------------------
                --
                ---

                172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

                pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
                result = re.findall(pattern, foo, re.MULTILINE)
                print "1st match: ", result[0][0]
                print "2nd match: ", result[0][1]


                Output



                1st match:  AFFIDAVIT OF
                2nd match: CASH & MTGE





                share|improve this answer


























                  3












                  3








                  3







                  re based code snippet



                  import re
                  foo = '''S
                  LINC SHORT LEGAL TITLE NUMBER
                  0037 471 661 1720278;16;21 172 211 342

                  LEGAL DESCRIPTION
                  PLAN 1720278
                  BLOCK 16
                  LOT 21
                  EXCEPTING THEREOUT ALL MINES AND MINERALS

                  ESTATE: FEE SIMPLE
                  ATS REFERENCE: 4;24;54;2;SW

                  MUNICIPALITY: CITY OF EDMONTON

                  REFERENCE NUMBER: 172 023 641 +71

                  ----------------------------------------------------------------------------
                  ----
                  REGISTERED OWNER(S)
                  REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
                  ---------------------------------------------------------------------------
                  --
                  ---

                  172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

                  pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
                  result = re.findall(pattern, foo, re.MULTILINE)
                  print "1st match: ", result[0][0]
                  print "2nd match: ", result[0][1]


                  Output



                  1st match:  AFFIDAVIT OF
                  2nd match: CASH & MTGE





                  share|improve this answer













                  re based code snippet



                  import re
                  foo = '''S
                  LINC SHORT LEGAL TITLE NUMBER
                  0037 471 661 1720278;16;21 172 211 342

                  LEGAL DESCRIPTION
                  PLAN 1720278
                  BLOCK 16
                  LOT 21
                  EXCEPTING THEREOUT ALL MINES AND MINERALS

                  ESTATE: FEE SIMPLE
                  ATS REFERENCE: 4;24;54;2;SW

                  MUNICIPALITY: CITY OF EDMONTON

                  REFERENCE NUMBER: 172 023 641 +71

                  ----------------------------------------------------------------------------
                  ----
                  REGISTERED OWNER(S)
                  REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
                  ---------------------------------------------------------------------------
                  --
                  ---

                  172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''

                  pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
                  result = re.findall(pattern, foo, re.MULTILINE)
                  print "1st match: ", result[0][0]
                  print "2nd match: ", result[0][1]


                  Output



                  1st match:  AFFIDAVIT OF
                  2nd match: CASH & MTGE






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 26 '18 at 4:19









                  SharadSharad

                  2,14111024




                  2,14111024























                      3














                      We can try using re.findall with the following pattern:



                      PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)


                      Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF until, but not including, CONDOMINIUM PLAN.



                      input = "182 246 612    01/10/2018  PHASED OF                           CASH & MTGEn        CONDOMINIUM PLAN"
                      result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
                      output = result[0][0].strip()
                      print(output)

                      CASH & MTGE


                      Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.






                      share|improve this answer


























                      • The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.

                        – User123
                        Dec 31 '18 at 4:36











                      • My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.

                        – Tim Biegeleisen
                        Dec 31 '18 at 4:40











                      • I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?

                        – User123
                        Dec 31 '18 at 4:46













                      • No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.

                        – Tim Biegeleisen
                        Dec 31 '18 at 4:50











                      • Okay fine what will be the modification that needs to be done if there is no multinline word after date?

                        – User123
                        Dec 31 '18 at 4:52
















                      3














                      We can try using re.findall with the following pattern:



                      PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)


                      Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF until, but not including, CONDOMINIUM PLAN.



                      input = "182 246 612    01/10/2018  PHASED OF                           CASH & MTGEn        CONDOMINIUM PLAN"
                      result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
                      output = result[0][0].strip()
                      print(output)

                      CASH & MTGE


                      Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.






                      share|improve this answer


























                      • The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.

                        – User123
                        Dec 31 '18 at 4:36











                      • My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.

                        – Tim Biegeleisen
                        Dec 31 '18 at 4:40











                      • I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?

                        – User123
                        Dec 31 '18 at 4:46













                      • No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.

                        – Tim Biegeleisen
                        Dec 31 '18 at 4:50











                      • Okay fine what will be the modification that needs to be done if there is no multinline word after date?

                        – User123
                        Dec 31 '18 at 4:52














                      3












                      3








                      3







                      We can try using re.findall with the following pattern:



                      PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)


                      Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF until, but not including, CONDOMINIUM PLAN.



                      input = "182 246 612    01/10/2018  PHASED OF                           CASH & MTGEn        CONDOMINIUM PLAN"
                      result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
                      output = result[0][0].strip()
                      print(output)

                      CASH & MTGE


                      Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.






                      share|improve this answer















                      We can try using re.findall with the following pattern:



                      PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)


                      Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF until, but not including, CONDOMINIUM PLAN.



                      input = "182 246 612    01/10/2018  PHASED OF                           CASH & MTGEn        CONDOMINIUM PLAN"
                      result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
                      output = result[0][0].strip()
                      print(output)

                      CASH & MTGE


                      Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Dec 31 '18 at 4:34

























                      answered Dec 31 '18 at 4:29









                      Tim BiegeleisenTim Biegeleisen

                      223k1391143




                      223k1391143













                      • The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.

                        – User123
                        Dec 31 '18 at 4:36











                      • My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.

                        – Tim Biegeleisen
                        Dec 31 '18 at 4:40











                      • I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?

                        – User123
                        Dec 31 '18 at 4:46













                      • No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.

                        – Tim Biegeleisen
                        Dec 31 '18 at 4:50











                      • Okay fine what will be the modification that needs to be done if there is no multinline word after date?

                        – User123
                        Dec 31 '18 at 4:52



















                      • The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.

                        – User123
                        Dec 31 '18 at 4:36











                      • My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.

                        – Tim Biegeleisen
                        Dec 31 '18 at 4:40











                      • I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?

                        – User123
                        Dec 31 '18 at 4:46













                      • No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.

                        – Tim Biegeleisen
                        Dec 31 '18 at 4:50











                      • Okay fine what will be the modification that needs to be done if there is no multinline word after date?

                        – User123
                        Dec 31 '18 at 4:52

















                      The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.

                      – User123
                      Dec 31 '18 at 4:36





                      The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.

                      – User123
                      Dec 31 '18 at 4:36













                      My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.

                      – Tim Biegeleisen
                      Dec 31 '18 at 4:40





                      My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.

                      – Tim Biegeleisen
                      Dec 31 '18 at 4:40













                      I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?

                      – User123
                      Dec 31 '18 at 4:46







                      I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?

                      – User123
                      Dec 31 '18 at 4:46















                      No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.

                      – Tim Biegeleisen
                      Dec 31 '18 at 4:50





                      No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.

                      – Tim Biegeleisen
                      Dec 31 '18 at 4:50













                      Okay fine what will be the modification that needs to be done if there is no multinline word after date?

                      – User123
                      Dec 31 '18 at 4:52





                      Okay fine what will be the modification that needs to be done if there is no multinline word after date?

                      – User123
                      Dec 31 '18 at 4:52











                      2














                      Why regular expressions?



                      It looks like you know the exact delimiting string, just str.split() by it and get the first part:



                      In [1]: a='172 211 342    15/08/2017  TRANSFER OF LAND   $610,000        CASH & MTGE'

                      In [2]: a.split("15/08/2017", 1)[0]
                      Out[2]: '172 211 342 '





                      share|improve this answer
























                      • It wont work for the input string which i have edited now

                        – User123
                        Dec 21 '18 at 6:16











                      • @Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.

                        – alecxe
                        Dec 21 '18 at 6:17


















                      2














                      Why regular expressions?



                      It looks like you know the exact delimiting string, just str.split() by it and get the first part:



                      In [1]: a='172 211 342    15/08/2017  TRANSFER OF LAND   $610,000        CASH & MTGE'

                      In [2]: a.split("15/08/2017", 1)[0]
                      Out[2]: '172 211 342 '





                      share|improve this answer
























                      • It wont work for the input string which i have edited now

                        – User123
                        Dec 21 '18 at 6:16











                      • @Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.

                        – alecxe
                        Dec 21 '18 at 6:17
















                      2












                      2








                      2







                      Why regular expressions?



                      It looks like you know the exact delimiting string, just str.split() by it and get the first part:



                      In [1]: a='172 211 342    15/08/2017  TRANSFER OF LAND   $610,000        CASH & MTGE'

                      In [2]: a.split("15/08/2017", 1)[0]
                      Out[2]: '172 211 342 '





                      share|improve this answer













                      Why regular expressions?



                      It looks like you know the exact delimiting string, just str.split() by it and get the first part:



                      In [1]: a='172 211 342    15/08/2017  TRANSFER OF LAND   $610,000        CASH & MTGE'

                      In [2]: a.split("15/08/2017", 1)[0]
                      Out[2]: '172 211 342 '






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Dec 21 '18 at 6:05









                      alecxealecxe

                      325k70630858




                      325k70630858













                      • It wont work for the input string which i have edited now

                        – User123
                        Dec 21 '18 at 6:16











                      • @Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.

                        – alecxe
                        Dec 21 '18 at 6:17





















                      • It wont work for the input string which i have edited now

                        – User123
                        Dec 21 '18 at 6:16











                      • @Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.

                        – alecxe
                        Dec 21 '18 at 6:17



















                      It wont work for the input string which i have edited now

                      – User123
                      Dec 21 '18 at 6:16





                      It wont work for the input string which i have edited now

                      – User123
                      Dec 21 '18 at 6:16













                      @Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.

                      – alecxe
                      Dec 21 '18 at 6:17







                      @Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.

                      – alecxe
                      Dec 21 '18 at 6:17













                      1














                      I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,} as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.



                      import re
                      a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
                      parts = re.compile("s{2,}").split(a)
                      print(parts)

                      for i in range(1, len(parts)):
                      if (parts[i] == "15/08/2017"):
                      print(parts[i-1])

                      ['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
                      172 211 342





                      share|improve this answer




























                        1














                        I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,} as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.



                        import re
                        a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
                        parts = re.compile("s{2,}").split(a)
                        print(parts)

                        for i in range(1, len(parts)):
                        if (parts[i] == "15/08/2017"):
                        print(parts[i-1])

                        ['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
                        172 211 342





                        share|improve this answer


























                          1












                          1








                          1







                          I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,} as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.



                          import re
                          a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
                          parts = re.compile("s{2,}").split(a)
                          print(parts)

                          for i in range(1, len(parts)):
                          if (parts[i] == "15/08/2017"):
                          print(parts[i-1])

                          ['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
                          172 211 342





                          share|improve this answer













                          I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,} as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.



                          import re
                          a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
                          parts = re.compile("s{2,}").split(a)
                          print(parts)

                          for i in range(1, len(parts)):
                          if (parts[i] == "15/08/2017"):
                          print(parts[i-1])

                          ['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
                          172 211 342






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Dec 21 '18 at 5:54









                          Tim BiegeleisenTim Biegeleisen

                          223k1391143




                          223k1391143























                              1














                              positive lookbehind assertion**



                               m=re.search('(?<=15/08/2017).*', a)
                              m.group(0)





                              share|improve this answer




























                                1














                                positive lookbehind assertion**



                                 m=re.search('(?<=15/08/2017).*', a)
                                m.group(0)





                                share|improve this answer


























                                  1












                                  1








                                  1







                                  positive lookbehind assertion**



                                   m=re.search('(?<=15/08/2017).*', a)
                                  m.group(0)





                                  share|improve this answer













                                  positive lookbehind assertion**



                                   m=re.search('(?<=15/08/2017).*', a)
                                  m.group(0)






                                  share|improve this answer












                                  share|improve this answer



                                  share|improve this answer










                                  answered Dec 26 '18 at 5:10









                                  PIGPIG

                                  1247




                                  1247























                                      0














                                      You have to return the right group:



                                      re.match("(.*?)15/08/2017",a).group(1)





                                      share|improve this answer




























                                        0














                                        You have to return the right group:



                                        re.match("(.*?)15/08/2017",a).group(1)





                                        share|improve this answer


























                                          0












                                          0








                                          0







                                          You have to return the right group:



                                          re.match("(.*?)15/08/2017",a).group(1)





                                          share|improve this answer













                                          You have to return the right group:



                                          re.match("(.*?)15/08/2017",a).group(1)






                                          share|improve this answer












                                          share|improve this answer



                                          share|improve this answer










                                          answered Dec 21 '18 at 5:53









                                          RoyaumeIXRoyaumeIX

                                          1,2491725




                                          1,2491725























                                              0














                                              You nede to use group(1)



                                              import re
                                              re.match("(.*?)15/08/2017",a).group(1)


                                              Output



                                              '172 211 342    '





                                              share|improve this answer




























                                                0














                                                You nede to use group(1)



                                                import re
                                                re.match("(.*?)15/08/2017",a).group(1)


                                                Output



                                                '172 211 342    '





                                                share|improve this answer


























                                                  0












                                                  0








                                                  0







                                                  You nede to use group(1)



                                                  import re
                                                  re.match("(.*?)15/08/2017",a).group(1)


                                                  Output



                                                  '172 211 342    '





                                                  share|improve this answer













                                                  You nede to use group(1)



                                                  import re
                                                  re.match("(.*?)15/08/2017",a).group(1)


                                                  Output



                                                  '172 211 342    '






                                                  share|improve this answer












                                                  share|improve this answer



                                                  share|improve this answer










                                                  answered Dec 21 '18 at 5:54









                                                  Rishi BansalRishi Bansal

                                                  740217




                                                  740217























                                                      0














                                                      Building on your expression, this is what I believe you need:



                                                      import re

                                                      a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
                                                      re.match("(.*?)(w+/)",a).group(1)


                                                      Output:



                                                      '172 211 342    '





                                                      share|improve this answer




























                                                        0














                                                        Building on your expression, this is what I believe you need:



                                                        import re

                                                        a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
                                                        re.match("(.*?)(w+/)",a).group(1)


                                                        Output:



                                                        '172 211 342    '





                                                        share|improve this answer


























                                                          0












                                                          0








                                                          0







                                                          Building on your expression, this is what I believe you need:



                                                          import re

                                                          a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
                                                          re.match("(.*?)(w+/)",a).group(1)


                                                          Output:



                                                          '172 211 342    '





                                                          share|improve this answer













                                                          Building on your expression, this is what I believe you need:



                                                          import re

                                                          a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
                                                          re.match("(.*?)(w+/)",a).group(1)


                                                          Output:



                                                          '172 211 342    '






                                                          share|improve this answer












                                                          share|improve this answer



                                                          share|improve this answer










                                                          answered Dec 21 '18 at 6:08









                                                          silverhashsilverhash

                                                          342110




                                                          342110























                                                              0














                                                              You can do this by using group(1)



                                                              re.match("(.*?)15/08/2017",a).group(1)


                                                              UPDATE



                                                              For updated string you can use .search instead of .match



                                                              re.search("(.*?)15/08/2017",a).group(1)





                                                              share|improve this answer


























                                                              • This will give incorrect results if there are more than one term before 15/08/2017.

                                                                – Tim Biegeleisen
                                                                Dec 21 '18 at 5:57











                                                              • I have edited my input string. It didn't work for the string which is edited now

                                                                – User123
                                                                Dec 21 '18 at 6:10











                                                              • This will fail completely if the desired term is anything other than the first term.

                                                                – Tim Biegeleisen
                                                                Dec 21 '18 at 6:25
















                                                              0














                                                              You can do this by using group(1)



                                                              re.match("(.*?)15/08/2017",a).group(1)


                                                              UPDATE



                                                              For updated string you can use .search instead of .match



                                                              re.search("(.*?)15/08/2017",a).group(1)





                                                              share|improve this answer


























                                                              • This will give incorrect results if there are more than one term before 15/08/2017.

                                                                – Tim Biegeleisen
                                                                Dec 21 '18 at 5:57











                                                              • I have edited my input string. It didn't work for the string which is edited now

                                                                – User123
                                                                Dec 21 '18 at 6:10











                                                              • This will fail completely if the desired term is anything other than the first term.

                                                                – Tim Biegeleisen
                                                                Dec 21 '18 at 6:25














                                                              0












                                                              0








                                                              0







                                                              You can do this by using group(1)



                                                              re.match("(.*?)15/08/2017",a).group(1)


                                                              UPDATE



                                                              For updated string you can use .search instead of .match



                                                              re.search("(.*?)15/08/2017",a).group(1)





                                                              share|improve this answer















                                                              You can do this by using group(1)



                                                              re.match("(.*?)15/08/2017",a).group(1)


                                                              UPDATE



                                                              For updated string you can use .search instead of .match



                                                              re.search("(.*?)15/08/2017",a).group(1)






                                                              share|improve this answer














                                                              share|improve this answer



                                                              share|improve this answer








                                                              edited Dec 21 '18 at 6:17

























                                                              answered Dec 21 '18 at 5:50









                                                              Muhammad BilalMuhammad Bilal

                                                              1,73011022




                                                              1,73011022













                                                              • This will give incorrect results if there are more than one term before 15/08/2017.

                                                                – Tim Biegeleisen
                                                                Dec 21 '18 at 5:57











                                                              • I have edited my input string. It didn't work for the string which is edited now

                                                                – User123
                                                                Dec 21 '18 at 6:10











                                                              • This will fail completely if the desired term is anything other than the first term.

                                                                – Tim Biegeleisen
                                                                Dec 21 '18 at 6:25



















                                                              • This will give incorrect results if there are more than one term before 15/08/2017.

                                                                – Tim Biegeleisen
                                                                Dec 21 '18 at 5:57











                                                              • I have edited my input string. It didn't work for the string which is edited now

                                                                – User123
                                                                Dec 21 '18 at 6:10











                                                              • This will fail completely if the desired term is anything other than the first term.

                                                                – Tim Biegeleisen
                                                                Dec 21 '18 at 6:25

















                                                              This will give incorrect results if there are more than one term before 15/08/2017.

                                                              – Tim Biegeleisen
                                                              Dec 21 '18 at 5:57





                                                              This will give incorrect results if there are more than one term before 15/08/2017.

                                                              – Tim Biegeleisen
                                                              Dec 21 '18 at 5:57













                                                              I have edited my input string. It didn't work for the string which is edited now

                                                              – User123
                                                              Dec 21 '18 at 6:10





                                                              I have edited my input string. It didn't work for the string which is edited now

                                                              – User123
                                                              Dec 21 '18 at 6:10













                                                              This will fail completely if the desired term is anything other than the first term.

                                                              – Tim Biegeleisen
                                                              Dec 21 '18 at 6:25





                                                              This will fail completely if the desired term is anything other than the first term.

                                                              – Tim Biegeleisen
                                                              Dec 21 '18 at 6:25











                                                              0














                                                              Your problem is that your string is formatted the way it is.
                                                              The line you are looking for is



                                                              182 246 612 01/10/2018 PHASED OF CASH & MTGE



                                                              And then you are looking for what ever comes after 'PHASED OF' and some spaces.



                                                              You want to search for




                                                              (?<=PHASED OF)s*(?P.*?)n




                                                              in your string. This will return a match object containing the value you are looking for in the group value.



                                                              m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
                                                              your_desired_text = m.group('your_text')


                                                              Also: There are many good online regex testers to fiddle around with your regexes.
                                                              And only after finishing up the regex just copy and paste it into python.



                                                              I use this one: https://regex101.com/






                                                              share|improve this answer
























                                                              • I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'

                                                                – User123
                                                                Dec 31 '18 at 4:39











                                                              • "I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?

                                                                – Kanjiu
                                                                Dec 31 '18 at 4:43
















                                                              0














                                                              Your problem is that your string is formatted the way it is.
                                                              The line you are looking for is



                                                              182 246 612 01/10/2018 PHASED OF CASH & MTGE



                                                              And then you are looking for what ever comes after 'PHASED OF' and some spaces.



                                                              You want to search for




                                                              (?<=PHASED OF)s*(?P.*?)n




                                                              in your string. This will return a match object containing the value you are looking for in the group value.



                                                              m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
                                                              your_desired_text = m.group('your_text')


                                                              Also: There are many good online regex testers to fiddle around with your regexes.
                                                              And only after finishing up the regex just copy and paste it into python.



                                                              I use this one: https://regex101.com/






                                                              share|improve this answer
























                                                              • I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'

                                                                – User123
                                                                Dec 31 '18 at 4:39











                                                              • "I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?

                                                                – Kanjiu
                                                                Dec 31 '18 at 4:43














                                                              0












                                                              0








                                                              0







                                                              Your problem is that your string is formatted the way it is.
                                                              The line you are looking for is



                                                              182 246 612 01/10/2018 PHASED OF CASH & MTGE



                                                              And then you are looking for what ever comes after 'PHASED OF' and some spaces.



                                                              You want to search for




                                                              (?<=PHASED OF)s*(?P.*?)n




                                                              in your string. This will return a match object containing the value you are looking for in the group value.



                                                              m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
                                                              your_desired_text = m.group('your_text')


                                                              Also: There are many good online regex testers to fiddle around with your regexes.
                                                              And only after finishing up the regex just copy and paste it into python.



                                                              I use this one: https://regex101.com/






                                                              share|improve this answer













                                                              Your problem is that your string is formatted the way it is.
                                                              The line you are looking for is



                                                              182 246 612 01/10/2018 PHASED OF CASH & MTGE



                                                              And then you are looking for what ever comes after 'PHASED OF' and some spaces.



                                                              You want to search for




                                                              (?<=PHASED OF)s*(?P.*?)n




                                                              in your string. This will return a match object containing the value you are looking for in the group value.



                                                              m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
                                                              your_desired_text = m.group('your_text')


                                                              Also: There are many good online regex testers to fiddle around with your regexes.
                                                              And only after finishing up the regex just copy and paste it into python.



                                                              I use this one: https://regex101.com/







                                                              share|improve this answer












                                                              share|improve this answer



                                                              share|improve this answer










                                                              answered Dec 31 '18 at 4:34









                                                              KanjiuKanjiu

                                                              42110




                                                              42110













                                                              • I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'

                                                                – User123
                                                                Dec 31 '18 at 4:39











                                                              • "I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?

                                                                – Kanjiu
                                                                Dec 31 '18 at 4:43



















                                                              • I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'

                                                                – User123
                                                                Dec 31 '18 at 4:39











                                                              • "I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?

                                                                – Kanjiu
                                                                Dec 31 '18 at 4:43

















                                                              I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'

                                                              – User123
                                                              Dec 31 '18 at 4:39





                                                              I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'

                                                              – User123
                                                              Dec 31 '18 at 4:39













                                                              "I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?

                                                              – Kanjiu
                                                              Dec 31 '18 at 4:43





                                                              "I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?

                                                              – Kanjiu
                                                              Dec 31 '18 at 4:43


















                                                              draft saved

                                                              draft discarded




















































                                                              Thanks for contributing an answer to Stack Overflow!


                                                              • Please be sure to answer the question. Provide details and share your research!

                                                              But avoid



                                                              • Asking for help, clarification, or responding to other answers.

                                                              • Making statements based on opinion; back them up with references or personal experience.


                                                              To learn more, see our tips on writing great answers.




                                                              draft saved


                                                              draft discarded














                                                              StackExchange.ready(
                                                              function () {
                                                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53927256%2fextract-substrings-separately-from-a-string-using-python-regex%23new-answer', 'question_page');
                                                              }
                                                              );

                                                              Post as a guest















                                                              Required, but never shown





















































                                                              Required, but never shown














                                                              Required, but never shown












                                                              Required, but never shown







                                                              Required, but never shown

































                                                              Required, but never shown














                                                              Required, but never shown












                                                              Required, but never shown







                                                              Required, but never shown







                                                              Popular posts from this blog

                                                              Monofisismo

                                                              Angular Downloading a file using contenturl with Basic Authentication

                                                              Olmecas