Extract substrings separately from a string using python regex
I am trying to write a regular expression which returns a part of substring which is after a string. For example: I want to get part of substring along with spaces which resides after "15/08/2017".
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
Is there a way to get 'AFFIDAVIT OF'
and 'CASH & MTGE'
as separate strings?
Here is the expression I have pieced together so far:
doc = (a.split('15/08/2017', 1)[1]).strip()
'AFFIDAVIT OF CASH & MTGE'
python regex python-3.x
add a comment |
I am trying to write a regular expression which returns a part of substring which is after a string. For example: I want to get part of substring along with spaces which resides after "15/08/2017".
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
Is there a way to get 'AFFIDAVIT OF'
and 'CASH & MTGE'
as separate strings?
Here is the expression I have pieced together so far:
doc = (a.split('15/08/2017', 1)[1]).strip()
'AFFIDAVIT OF CASH & MTGE'
python regex python-3.x
I have edited with the actual input string.
– User123
Dec 21 '18 at 6:11
Okay anyway to do this using regex?
– User123
Dec 31 '18 at 4:15
Why do you want to do this with regex? Are you willing to accept any other solution?
– Mad Physicist
Dec 31 '18 at 4:29
Yes if there is a better way other than regex
– User123
Dec 31 '18 at 4:30
add a comment |
I am trying to write a regular expression which returns a part of substring which is after a string. For example: I want to get part of substring along with spaces which resides after "15/08/2017".
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
Is there a way to get 'AFFIDAVIT OF'
and 'CASH & MTGE'
as separate strings?
Here is the expression I have pieced together so far:
doc = (a.split('15/08/2017', 1)[1]).strip()
'AFFIDAVIT OF CASH & MTGE'
python regex python-3.x
I am trying to write a regular expression which returns a part of substring which is after a string. For example: I want to get part of substring along with spaces which resides after "15/08/2017".
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
Is there a way to get 'AFFIDAVIT OF'
and 'CASH & MTGE'
as separate strings?
Here is the expression I have pieced together so far:
doc = (a.split('15/08/2017', 1)[1]).strip()
'AFFIDAVIT OF CASH & MTGE'
python regex python-3.x
python regex python-3.x
edited Dec 26 '18 at 4:16
CodeIt
67311020
67311020
asked Dec 26 '18 at 3:54
User123User123
2001416
2001416
I have edited with the actual input string.
– User123
Dec 21 '18 at 6:11
Okay anyway to do this using regex?
– User123
Dec 31 '18 at 4:15
Why do you want to do this with regex? Are you willing to accept any other solution?
– Mad Physicist
Dec 31 '18 at 4:29
Yes if there is a better way other than regex
– User123
Dec 31 '18 at 4:30
add a comment |
I have edited with the actual input string.
– User123
Dec 21 '18 at 6:11
Okay anyway to do this using regex?
– User123
Dec 31 '18 at 4:15
Why do you want to do this with regex? Are you willing to accept any other solution?
– Mad Physicist
Dec 31 '18 at 4:29
Yes if there is a better way other than regex
– User123
Dec 31 '18 at 4:30
I have edited with the actual input string.
– User123
Dec 21 '18 at 6:11
I have edited with the actual input string.
– User123
Dec 21 '18 at 6:11
Okay anyway to do this using regex?
– User123
Dec 31 '18 at 4:15
Okay anyway to do this using regex?
– User123
Dec 31 '18 at 4:15
Why do you want to do this with regex? Are you willing to accept any other solution?
– Mad Physicist
Dec 31 '18 at 4:29
Why do you want to do this with regex? Are you willing to accept any other solution?
– Mad Physicist
Dec 31 '18 at 4:29
Yes if there is a better way other than regex
– User123
Dec 31 '18 at 4:30
Yes if there is a better way other than regex
– User123
Dec 31 '18 at 4:30
add a comment |
11 Answers
11
active
oldest
votes
Not a regex based solution. But does the trick.
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
doc = (a.split('15/08/2017', 1)[1]).strip()
# used split with two white spaces instead of one to get the desired result
print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE
Hope it helps.
See it in action here.
– CodeIt
Dec 26 '18 at 4:03
add a comment |
re based code snippet
import re
foo = '''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
result = re.findall(pattern, foo, re.MULTILINE)
print "1st match: ", result[0][0]
print "2nd match: ", result[0][1]
Output
1st match: AFFIDAVIT OF
2nd match: CASH & MTGE
add a comment |
We can try using re.findall
with the following pattern:
PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)
Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF
until, but not including, CONDOMINIUM PLAN
.
input = "182 246 612 01/10/2018 PHASED OF CASH & MTGEn CONDOMINIUM PLAN"
result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
output = result[0][0].strip()
print(output)
CASH & MTGE
Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.
The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.
– User123
Dec 31 '18 at 4:36
My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.
– Tim Biegeleisen
Dec 31 '18 at 4:40
I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?
– User123
Dec 31 '18 at 4:46
No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.
– Tim Biegeleisen
Dec 31 '18 at 4:50
Okay fine what will be the modification that needs to be done if there is no multinline word after date?
– User123
Dec 31 '18 at 4:52
add a comment |
Why regular expressions?
It looks like you know the exact delimiting string, just str.split()
by it and get the first part:
In [1]: a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
In [2]: a.split("15/08/2017", 1)[0]
Out[2]: '172 211 342 '
It wont work for the input string which i have edited now
– User123
Dec 21 '18 at 6:16
@Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.
– alecxe
Dec 21 '18 at 6:17
add a comment |
I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,}
as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.
import re
a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
parts = re.compile("s{2,}").split(a)
print(parts)
for i in range(1, len(parts)):
if (parts[i] == "15/08/2017"):
print(parts[i-1])
['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
172 211 342
add a comment |
positive lookbehind assertion**
m=re.search('(?<=15/08/2017).*', a)
m.group(0)
add a comment |
You have to return the right group:
re.match("(.*?)15/08/2017",a).group(1)
add a comment |
You nede to use group(1)
import re
re.match("(.*?)15/08/2017",a).group(1)
Output
'172 211 342 '
add a comment |
Building on your expression, this is what I believe you need:
import re
a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
re.match("(.*?)(w+/)",a).group(1)
Output:
'172 211 342 '
add a comment |
You can do this by using group(1)
re.match("(.*?)15/08/2017",a).group(1)
UPDATE
For updated string you can use .search
instead of .match
re.search("(.*?)15/08/2017",a).group(1)
This will give incorrect results if there are more than one term before15/08/2017
.
– Tim Biegeleisen
Dec 21 '18 at 5:57
I have edited my input string. It didn't work for the string which is edited now
– User123
Dec 21 '18 at 6:10
This will fail completely if the desired term is anything other than the first term.
– Tim Biegeleisen
Dec 21 '18 at 6:25
add a comment |
Your problem is that your string is formatted the way it is.
The line you are looking for is
182 246 612 01/10/2018 PHASED OF CASH & MTGE
And then you are looking for what ever comes after 'PHASED OF' and some spaces.
You want to search for
(?<=PHASED OF)s*(?P.*?)n
in your string. This will return a match object containing the value you are looking for in the group value
.
m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
your_desired_text = m.group('your_text')
Also: There are many good online regex testers to fiddle around with your regexes.
And only after finishing up the regex just copy and paste it into python.
I use this one: https://regex101.com/
I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'
– User123
Dec 31 '18 at 4:39
"I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?
– Kanjiu
Dec 31 '18 at 4:43
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53927256%2fextract-substrings-separately-from-a-string-using-python-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
11 Answers
11
active
oldest
votes
11 Answers
11
active
oldest
votes
active
oldest
votes
active
oldest
votes
Not a regex based solution. But does the trick.
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
doc = (a.split('15/08/2017', 1)[1]).strip()
# used split with two white spaces instead of one to get the desired result
print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE
Hope it helps.
See it in action here.
– CodeIt
Dec 26 '18 at 4:03
add a comment |
Not a regex based solution. But does the trick.
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
doc = (a.split('15/08/2017', 1)[1]).strip()
# used split with two white spaces instead of one to get the desired result
print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE
Hope it helps.
See it in action here.
– CodeIt
Dec 26 '18 at 4:03
add a comment |
Not a regex based solution. But does the trick.
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
doc = (a.split('15/08/2017', 1)[1]).strip()
# used split with two white spaces instead of one to get the desired result
print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE
Hope it helps.
Not a regex based solution. But does the trick.
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
doc = (a.split('15/08/2017', 1)[1]).strip()
# used split with two white spaces instead of one to get the desired result
print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE
Hope it helps.
answered Dec 26 '18 at 4:00
CodeItCodeIt
67311020
67311020
See it in action here.
– CodeIt
Dec 26 '18 at 4:03
add a comment |
See it in action here.
– CodeIt
Dec 26 '18 at 4:03
See it in action here.
– CodeIt
Dec 26 '18 at 4:03
See it in action here.
– CodeIt
Dec 26 '18 at 4:03
add a comment |
re based code snippet
import re
foo = '''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
result = re.findall(pattern, foo, re.MULTILINE)
print "1st match: ", result[0][0]
print "2nd match: ", result[0][1]
Output
1st match: AFFIDAVIT OF
2nd match: CASH & MTGE
add a comment |
re based code snippet
import re
foo = '''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
result = re.findall(pattern, foo, re.MULTILINE)
print "1st match: ", result[0][0]
print "2nd match: ", result[0][1]
Output
1st match: AFFIDAVIT OF
2nd match: CASH & MTGE
add a comment |
re based code snippet
import re
foo = '''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
result = re.findall(pattern, foo, re.MULTILINE)
print "1st match: ", result[0][0]
print "2nd match: ", result[0][1]
Output
1st match: AFFIDAVIT OF
2nd match: CASH & MTGE
re based code snippet
import re
foo = '''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
pattern = '.*d{2}/d{2}/d{4}s+(w+s+w+)s+(w+s+.*s+w+)'
result = re.findall(pattern, foo, re.MULTILINE)
print "1st match: ", result[0][0]
print "2nd match: ", result[0][1]
Output
1st match: AFFIDAVIT OF
2nd match: CASH & MTGE
answered Dec 26 '18 at 4:19
SharadSharad
2,14111024
2,14111024
add a comment |
add a comment |
We can try using re.findall
with the following pattern:
PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)
Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF
until, but not including, CONDOMINIUM PLAN
.
input = "182 246 612 01/10/2018 PHASED OF CASH & MTGEn CONDOMINIUM PLAN"
result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
output = result[0][0].strip()
print(output)
CASH & MTGE
Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.
The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.
– User123
Dec 31 '18 at 4:36
My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.
– Tim Biegeleisen
Dec 31 '18 at 4:40
I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?
– User123
Dec 31 '18 at 4:46
No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.
– Tim Biegeleisen
Dec 31 '18 at 4:50
Okay fine what will be the modification that needs to be done if there is no multinline word after date?
– User123
Dec 31 '18 at 4:52
add a comment |
We can try using re.findall
with the following pattern:
PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)
Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF
until, but not including, CONDOMINIUM PLAN
.
input = "182 246 612 01/10/2018 PHASED OF CASH & MTGEn CONDOMINIUM PLAN"
result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
output = result[0][0].strip()
print(output)
CASH & MTGE
Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.
The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.
– User123
Dec 31 '18 at 4:36
My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.
– Tim Biegeleisen
Dec 31 '18 at 4:40
I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?
– User123
Dec 31 '18 at 4:46
No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.
– Tim Biegeleisen
Dec 31 '18 at 4:50
Okay fine what will be the modification that needs to be done if there is no multinline word after date?
– User123
Dec 31 '18 at 4:52
add a comment |
We can try using re.findall
with the following pattern:
PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)
Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF
until, but not including, CONDOMINIUM PLAN
.
input = "182 246 612 01/10/2018 PHASED OF CASH & MTGEn CONDOMINIUM PLAN"
result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
output = result[0][0].strip()
print(output)
CASH & MTGE
Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.
We can try using re.findall
with the following pattern:
PHASED OF ((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)
Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF
until, but not including, CONDOMINIUM PLAN
.
input = "182 246 612 01/10/2018 PHASED OF CASH & MTGEn CONDOMINIUM PLAN"
result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
output = result[0][0].strip()
print(output)
CASH & MTGE
Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.
edited Dec 31 '18 at 4:34
answered Dec 31 '18 at 4:29
Tim BiegeleisenTim Biegeleisen
223k1391143
223k1391143
The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.
– User123
Dec 31 '18 at 4:36
My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.
– Tim Biegeleisen
Dec 31 '18 at 4:40
I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?
– User123
Dec 31 '18 at 4:46
No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.
– Tim Biegeleisen
Dec 31 '18 at 4:50
Okay fine what will be the modification that needs to be done if there is no multinline word after date?
– User123
Dec 31 '18 at 4:52
add a comment |
The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.
– User123
Dec 31 '18 at 4:36
My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.
– Tim Biegeleisen
Dec 31 '18 at 4:40
I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?
– User123
Dec 31 '18 at 4:46
No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.
– Tim Biegeleisen
Dec 31 '18 at 4:50
Okay fine what will be the modification that needs to be done if there is no multinline word after date?
– User123
Dec 31 '18 at 4:52
The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.
– User123
Dec 31 '18 at 4:36
The thing is the string below DOCUMENT TYPE may be multiline and need not be necessarily a multiline. If it is multiline, it should consider it.
– User123
Dec 31 '18 at 4:36
My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.
– Tim Biegeleisen
Dec 31 '18 at 4:40
My answer covers a multiline situation. If you see a flaw in my answer, then state exactly what it is.
– Tim Biegeleisen
Dec 31 '18 at 4:40
I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?
– User123
Dec 31 '18 at 4:46
I cant get you what this does result = re.findall(r'PHASED OF (((?!bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE). Cant we give 'PHASED OF CONDOMINIUM PLAN' as single word ?
– User123
Dec 31 '18 at 4:46
No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.
– Tim Biegeleisen
Dec 31 '18 at 4:50
No, we can't, hence I initially commented under your question that there is no answer. You need to match across lines.
– Tim Biegeleisen
Dec 31 '18 at 4:50
Okay fine what will be the modification that needs to be done if there is no multinline word after date?
– User123
Dec 31 '18 at 4:52
Okay fine what will be the modification that needs to be done if there is no multinline word after date?
– User123
Dec 31 '18 at 4:52
add a comment |
Why regular expressions?
It looks like you know the exact delimiting string, just str.split()
by it and get the first part:
In [1]: a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
In [2]: a.split("15/08/2017", 1)[0]
Out[2]: '172 211 342 '
It wont work for the input string which i have edited now
– User123
Dec 21 '18 at 6:16
@Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.
– alecxe
Dec 21 '18 at 6:17
add a comment |
Why regular expressions?
It looks like you know the exact delimiting string, just str.split()
by it and get the first part:
In [1]: a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
In [2]: a.split("15/08/2017", 1)[0]
Out[2]: '172 211 342 '
It wont work for the input string which i have edited now
– User123
Dec 21 '18 at 6:16
@Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.
– alecxe
Dec 21 '18 at 6:17
add a comment |
Why regular expressions?
It looks like you know the exact delimiting string, just str.split()
by it and get the first part:
In [1]: a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
In [2]: a.split("15/08/2017", 1)[0]
Out[2]: '172 211 342 '
Why regular expressions?
It looks like you know the exact delimiting string, just str.split()
by it and get the first part:
In [1]: a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
In [2]: a.split("15/08/2017", 1)[0]
Out[2]: '172 211 342 '
answered Dec 21 '18 at 6:05
alecxealecxe
325k70630858
325k70630858
It wont work for the input string which i have edited now
– User123
Dec 21 '18 at 6:16
@Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.
– alecxe
Dec 21 '18 at 6:17
add a comment |
It wont work for the input string which i have edited now
– User123
Dec 21 '18 at 6:16
@Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.
– alecxe
Dec 21 '18 at 6:17
It wont work for the input string which i have edited now
– User123
Dec 21 '18 at 6:16
It wont work for the input string which i have edited now
– User123
Dec 21 '18 at 6:16
@Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.
– alecxe
Dec 21 '18 at 6:17
@Farook in this state it won't, right. You could though adjust the solution and split it on a newline first, but in that case, regex would be able to do it in one go.
– alecxe
Dec 21 '18 at 6:17
add a comment |
I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,}
as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.
import re
a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
parts = re.compile("s{2,}").split(a)
print(parts)
for i in range(1, len(parts)):
if (parts[i] == "15/08/2017"):
print(parts[i-1])
['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
172 211 342
add a comment |
I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,}
as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.
import re
a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
parts = re.compile("s{2,}").split(a)
print(parts)
for i in range(1, len(parts)):
if (parts[i] == "15/08/2017"):
print(parts[i-1])
['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
172 211 342
add a comment |
I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,}
as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.
import re
a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
parts = re.compile("s{2,}").split(a)
print(parts)
for i in range(1, len(parts)):
if (parts[i] == "15/08/2017"):
print(parts[i-1])
['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
172 211 342
I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using s{2,}
as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.
import re
a = "172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE"
parts = re.compile("s{2,}").split(a)
print(parts)
for i in range(1, len(parts)):
if (parts[i] == "15/08/2017"):
print(parts[i-1])
['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '$610,000', 'CASH & MTGE']
172 211 342
answered Dec 21 '18 at 5:54
Tim BiegeleisenTim Biegeleisen
223k1391143
223k1391143
add a comment |
add a comment |
positive lookbehind assertion**
m=re.search('(?<=15/08/2017).*', a)
m.group(0)
add a comment |
positive lookbehind assertion**
m=re.search('(?<=15/08/2017).*', a)
m.group(0)
add a comment |
positive lookbehind assertion**
m=re.search('(?<=15/08/2017).*', a)
m.group(0)
positive lookbehind assertion**
m=re.search('(?<=15/08/2017).*', a)
m.group(0)
answered Dec 26 '18 at 5:10
PIGPIG
1247
1247
add a comment |
add a comment |
You have to return the right group:
re.match("(.*?)15/08/2017",a).group(1)
add a comment |
You have to return the right group:
re.match("(.*?)15/08/2017",a).group(1)
add a comment |
You have to return the right group:
re.match("(.*?)15/08/2017",a).group(1)
You have to return the right group:
re.match("(.*?)15/08/2017",a).group(1)
answered Dec 21 '18 at 5:53
RoyaumeIXRoyaumeIX
1,2491725
1,2491725
add a comment |
add a comment |
You nede to use group(1)
import re
re.match("(.*?)15/08/2017",a).group(1)
Output
'172 211 342 '
add a comment |
You nede to use group(1)
import re
re.match("(.*?)15/08/2017",a).group(1)
Output
'172 211 342 '
add a comment |
You nede to use group(1)
import re
re.match("(.*?)15/08/2017",a).group(1)
Output
'172 211 342 '
You nede to use group(1)
import re
re.match("(.*?)15/08/2017",a).group(1)
Output
'172 211 342 '
answered Dec 21 '18 at 5:54
Rishi BansalRishi Bansal
740217
740217
add a comment |
add a comment |
Building on your expression, this is what I believe you need:
import re
a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
re.match("(.*?)(w+/)",a).group(1)
Output:
'172 211 342 '
add a comment |
Building on your expression, this is what I believe you need:
import re
a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
re.match("(.*?)(w+/)",a).group(1)
Output:
'172 211 342 '
add a comment |
Building on your expression, this is what I believe you need:
import re
a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
re.match("(.*?)(w+/)",a).group(1)
Output:
'172 211 342 '
Building on your expression, this is what I believe you need:
import re
a='172 211 342 15/08/2017 TRANSFER OF LAND $610,000 CASH & MTGE'
re.match("(.*?)(w+/)",a).group(1)
Output:
'172 211 342 '
answered Dec 21 '18 at 6:08
silverhashsilverhash
342110
342110
add a comment |
add a comment |
You can do this by using group(1)
re.match("(.*?)15/08/2017",a).group(1)
UPDATE
For updated string you can use .search
instead of .match
re.search("(.*?)15/08/2017",a).group(1)
This will give incorrect results if there are more than one term before15/08/2017
.
– Tim Biegeleisen
Dec 21 '18 at 5:57
I have edited my input string. It didn't work for the string which is edited now
– User123
Dec 21 '18 at 6:10
This will fail completely if the desired term is anything other than the first term.
– Tim Biegeleisen
Dec 21 '18 at 6:25
add a comment |
You can do this by using group(1)
re.match("(.*?)15/08/2017",a).group(1)
UPDATE
For updated string you can use .search
instead of .match
re.search("(.*?)15/08/2017",a).group(1)
This will give incorrect results if there are more than one term before15/08/2017
.
– Tim Biegeleisen
Dec 21 '18 at 5:57
I have edited my input string. It didn't work for the string which is edited now
– User123
Dec 21 '18 at 6:10
This will fail completely if the desired term is anything other than the first term.
– Tim Biegeleisen
Dec 21 '18 at 6:25
add a comment |
You can do this by using group(1)
re.match("(.*?)15/08/2017",a).group(1)
UPDATE
For updated string you can use .search
instead of .match
re.search("(.*?)15/08/2017",a).group(1)
You can do this by using group(1)
re.match("(.*?)15/08/2017",a).group(1)
UPDATE
For updated string you can use .search
instead of .match
re.search("(.*?)15/08/2017",a).group(1)
edited Dec 21 '18 at 6:17
answered Dec 21 '18 at 5:50
Muhammad BilalMuhammad Bilal
1,73011022
1,73011022
This will give incorrect results if there are more than one term before15/08/2017
.
– Tim Biegeleisen
Dec 21 '18 at 5:57
I have edited my input string. It didn't work for the string which is edited now
– User123
Dec 21 '18 at 6:10
This will fail completely if the desired term is anything other than the first term.
– Tim Biegeleisen
Dec 21 '18 at 6:25
add a comment |
This will give incorrect results if there are more than one term before15/08/2017
.
– Tim Biegeleisen
Dec 21 '18 at 5:57
I have edited my input string. It didn't work for the string which is edited now
– User123
Dec 21 '18 at 6:10
This will fail completely if the desired term is anything other than the first term.
– Tim Biegeleisen
Dec 21 '18 at 6:25
This will give incorrect results if there are more than one term before
15/08/2017
.– Tim Biegeleisen
Dec 21 '18 at 5:57
This will give incorrect results if there are more than one term before
15/08/2017
.– Tim Biegeleisen
Dec 21 '18 at 5:57
I have edited my input string. It didn't work for the string which is edited now
– User123
Dec 21 '18 at 6:10
I have edited my input string. It didn't work for the string which is edited now
– User123
Dec 21 '18 at 6:10
This will fail completely if the desired term is anything other than the first term.
– Tim Biegeleisen
Dec 21 '18 at 6:25
This will fail completely if the desired term is anything other than the first term.
– Tim Biegeleisen
Dec 21 '18 at 6:25
add a comment |
Your problem is that your string is formatted the way it is.
The line you are looking for is
182 246 612 01/10/2018 PHASED OF CASH & MTGE
And then you are looking for what ever comes after 'PHASED OF' and some spaces.
You want to search for
(?<=PHASED OF)s*(?P.*?)n
in your string. This will return a match object containing the value you are looking for in the group value
.
m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
your_desired_text = m.group('your_text')
Also: There are many good online regex testers to fiddle around with your regexes.
And only after finishing up the regex just copy and paste it into python.
I use this one: https://regex101.com/
I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'
– User123
Dec 31 '18 at 4:39
"I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?
– Kanjiu
Dec 31 '18 at 4:43
add a comment |
Your problem is that your string is formatted the way it is.
The line you are looking for is
182 246 612 01/10/2018 PHASED OF CASH & MTGE
And then you are looking for what ever comes after 'PHASED OF' and some spaces.
You want to search for
(?<=PHASED OF)s*(?P.*?)n
in your string. This will return a match object containing the value you are looking for in the group value
.
m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
your_desired_text = m.group('your_text')
Also: There are many good online regex testers to fiddle around with your regexes.
And only after finishing up the regex just copy and paste it into python.
I use this one: https://regex101.com/
I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'
– User123
Dec 31 '18 at 4:39
"I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?
– Kanjiu
Dec 31 '18 at 4:43
add a comment |
Your problem is that your string is formatted the way it is.
The line you are looking for is
182 246 612 01/10/2018 PHASED OF CASH & MTGE
And then you are looking for what ever comes after 'PHASED OF' and some spaces.
You want to search for
(?<=PHASED OF)s*(?P.*?)n
in your string. This will return a match object containing the value you are looking for in the group value
.
m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
your_desired_text = m.group('your_text')
Also: There are many good online regex testers to fiddle around with your regexes.
And only after finishing up the regex just copy and paste it into python.
I use this one: https://regex101.com/
Your problem is that your string is formatted the way it is.
The line you are looking for is
182 246 612 01/10/2018 PHASED OF CASH & MTGE
And then you are looking for what ever comes after 'PHASED OF' and some spaces.
You want to search for
(?<=PHASED OF)s*(?P.*?)n
in your string. This will return a match object containing the value you are looking for in the group value
.
m = re.search(r'(?<=PHASED OF)s*(?P<your_text>.*?)n', a)
your_desired_text = m.group('your_text')
Also: There are many good online regex testers to fiddle around with your regexes.
And only after finishing up the regex just copy and paste it into python.
I use this one: https://regex101.com/
answered Dec 31 '18 at 4:34
KanjiuKanjiu
42110
42110
I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'
– User123
Dec 31 '18 at 4:39
"I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?
– Kanjiu
Dec 31 '18 at 4:43
add a comment |
I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'
– User123
Dec 31 '18 at 4:39
"I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?
– Kanjiu
Dec 31 '18 at 4:43
I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'
– User123
Dec 31 '18 at 4:39
I am not searching for what ever comes after 'PHASED OF' and some spaces. Instead i am seraching for the string after the entire word below the DPCUMENT TYPE (i.e) 'PHASED OF CONDOMINIUM PLAN'
– User123
Dec 31 '18 at 4:39
"I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?
– Kanjiu
Dec 31 '18 at 4:43
"I need to get the string after the word 'PHASED OF CONDOMINIUM PLAN' which should returns 'CASH & MTGE' I have tried using the below expression". Where did i go wrong?
– Kanjiu
Dec 31 '18 at 4:43
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53927256%2fextract-substrings-separately-from-a-string-using-python-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I have edited with the actual input string.
– User123
Dec 21 '18 at 6:11
Okay anyway to do this using regex?
– User123
Dec 31 '18 at 4:15
Why do you want to do this with regex? Are you willing to accept any other solution?
– Mad Physicist
Dec 31 '18 at 4:29
Yes if there is a better way other than regex
– User123
Dec 31 '18 at 4:30