More efficient implementation of Textacy / spacy 'subject_verb_object_triples'
I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Sample data (sp500news)
date_publish
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets
python pandas nlp spacy textacy
New contributor
add a comment |
I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Sample data (sp500news)
date_publish
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets
python pandas nlp spacy textacy
New contributor
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
add a comment |
I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Sample data (sp500news)
date_publish
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets
python pandas nlp spacy textacy
New contributor
I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Sample data (sp500news)
date_publish
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets
python pandas nlp spacy textacy
python pandas nlp spacy textacy
New contributor
New contributor
edited Dec 27 at 13:22
New contributor
asked Dec 27 at 13:11
W.R
32
32
New contributor
New contributor
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
add a comment |
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21
add a comment |
1 Answer
1
active
oldest
votes
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
W.R is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945672%2fmore-efficient-implementation-of-textacy-spacy-subject-verb-object-triples%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
add a comment |
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
add a comment |
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
This should speed it somewhat -
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)
Explanation
In OP imlementation, nlp = spacy.load('en_core_web_sm')
is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.
Also, the tuple
casting to list
can happen only if the tuple is not empty.
edited Dec 27 at 13:30
answered Dec 27 at 13:22
Vivek Kalyanarangan
4,7511826
4,7511826
add a comment |
add a comment |
W.R is a new contributor. Be nice, and check out our Code of Conduct.
W.R is a new contributor. Be nice, and check out our Code of Conduct.
W.R is a new contributor. Be nice, and check out our Code of Conduct.
W.R is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945672%2fmore-efficient-implementation-of-textacy-spacy-subject-verb-object-triples%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13
Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16
Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17
@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21