More efficient implementation of Textacy / spacy 'subject_verb_object_triples'

Multi tool use
I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Sample data (sp500news)
date_publish
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets
python pandas nlp spacy textacy
New contributor
W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Sample data (sp500news)
date_publish
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets
python pandas nlp spacy textacy
New contributor
W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
add a comment |
I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Sample data (sp500news)
date_publish
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets
python pandas nlp spacy textacy
New contributor
W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Sample data (sp500news)
date_publish
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets
python pandas nlp spacy textacy
python pandas nlp spacy textacy
New contributor
W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited Dec 27 at 13:22
New contributor
W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked Dec 27 at 13:11
W.R
32
32
New contributor
W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
add a comment |
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
add a comment |
1 Answer
1
active
oldest
votes
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
W.R is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945672%2fmore-efficient-implementation-of-textacy-spacy-subject-verb-object-triples%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
add a comment |
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
add a comment |
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
edited Dec 27 at 13:30
answered Dec 27 at 13:22


Vivek Kalyanarangan
4,7511826
4,7511826
add a comment |
add a comment |
W.R is a new contributor. Be nice, and check out our Code of Conduct.
W.R is a new contributor. Be nice, and check out our Code of Conduct.
W.R is a new contributor. Be nice, and check out our Code of Conduct.
W.R is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945672%2fmore-efficient-implementation-of-textacy-spacy-subject-verb-object-triples%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
yN,35kG,NPGR,KNbu,8O0qAo5dghcdtyo,Sgqm4LUjCW a mFk2364fLIQ9nGI P6jbD,U8qcl
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21