More efficient implementation of Textacy / spacy 'subject_verb_object_triples'

I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?

import spacy

import textacy



def extract_SVO(text):



    nlp = spacy.load('en_core_web_sm')

    doc = nlp(text)

    tuples = textacy.extract.subject_verb_object_triples(doc)

    tuples_to_list = list(tuples)

    if tuples_to_list != :

        tuples_list.append(tuples_to_list)



tuples_list =           

sp500news['title'].apply(extract_SVO)

print(tuples_list)

Sample data (sp500news)

    date_publish  

0       2013-05-14 17:17:05   

1       2014-05-09 20:15:57   

4       2018-07-19 10:29:54   

6       2012-04-17 21:02:54   

8       2012-12-12 20:17:56   

9       2018-11-08 10:51:49   

11      2013-08-25 07:13:31   

12      2015-01-09 00:54:17   



 title  

0       Italy will not dismantle Montis labour reform  minister                            

1       Exclusive US agency FinCEN rejected veterans in bid to hire lawyers                

4       Xis campaign to draw people back to graying rural China faces uphill battle        

6       Romney begins to win over conservatives                                            

8       Oregon mall shooting survivor in serious condition                                 

9       Polands PGNiG to sign another deal for LNG supplies from US CEO                    

11      Australias opposition leader pledges stronger economy if elected PM                

12      New York shifts into Code Blue to get homeless off frigid streets

edited Dec 27 at 13:22

asked Dec 27 at 13:11

W.R

New contributor

can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13

Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16

Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17

@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21

add a comment |

import spacy

import textacy



def extract_SVO(text):



    nlp = spacy.load('en_core_web_sm')

    doc = nlp(text)

    tuples = textacy.extract.subject_verb_object_triples(doc)

    tuples_to_list = list(tuples)

    if tuples_to_list != :

        tuples_list.append(tuples_to_list)



tuples_list =           

sp500news['title'].apply(extract_SVO)

print(tuples_list)

Sample data (sp500news)

    date_publish  

0       2013-05-14 17:17:05   

1       2014-05-09 20:15:57   

4       2018-07-19 10:29:54   

6       2012-04-17 21:02:54   

8       2012-12-12 20:17:56   

9       2018-11-08 10:51:49   

11      2013-08-25 07:13:31   

12      2015-01-09 00:54:17   



 title  

0       Italy will not dismantle Montis labour reform  minister                            

1       Exclusive US agency FinCEN rejected veterans in bid to hire lawyers                

4       Xis campaign to draw people back to graying rural China faces uphill battle        

6       Romney begins to win over conservatives                                            

8       Oregon mall shooting survivor in serious condition                                 

9       Polands PGNiG to sign another deal for LNG supplies from US CEO                    

11      Australias opposition leader pledges stronger economy if elected PM                

12      New York shifts into Code Blue to get homeless off frigid streets

edited Dec 27 at 13:22

asked Dec 27 at 13:11

W.R

New contributor

can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13

Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16

Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17

@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21

add a comment |

import spacy

import textacy



def extract_SVO(text):



    nlp = spacy.load('en_core_web_sm')

    doc = nlp(text)

    tuples = textacy.extract.subject_verb_object_triples(doc)

    tuples_to_list = list(tuples)

    if tuples_to_list != :

        tuples_list.append(tuples_to_list)



tuples_list =           

sp500news['title'].apply(extract_SVO)

print(tuples_list)

Sample data (sp500news)

    date_publish  

0       2013-05-14 17:17:05   

1       2014-05-09 20:15:57   

4       2018-07-19 10:29:54   

6       2012-04-17 21:02:54   

8       2012-12-12 20:17:56   

9       2018-11-08 10:51:49   

11      2013-08-25 07:13:31   

12      2015-01-09 00:54:17   



 title  

0       Italy will not dismantle Montis labour reform  minister                            

1       Exclusive US agency FinCEN rejected veterans in bid to hire lawyers                

4       Xis campaign to draw people back to graying rural China faces uphill battle        

6       Romney begins to win over conservatives                                            

8       Oregon mall shooting survivor in serious condition                                 

9       Polands PGNiG to sign another deal for LNG supplies from US CEO                    

11      Australias opposition leader pledges stronger economy if elected PM                

12      New York shifts into Code Blue to get homeless off frigid streets

edited Dec 27 at 13:22

asked Dec 27 at 13:11

W.R

New contributor

import spacy

import textacy



def extract_SVO(text):



    nlp = spacy.load('en_core_web_sm')

    doc = nlp(text)

    tuples = textacy.extract.subject_verb_object_triples(doc)

    tuples_to_list = list(tuples)

    if tuples_to_list != :

        tuples_list.append(tuples_to_list)



tuples_list =           

sp500news['title'].apply(extract_SVO)

print(tuples_list)

Sample data (sp500news)

    date_publish  

0       2013-05-14 17:17:05   

1       2014-05-09 20:15:57   

4       2018-07-19 10:29:54   

6       2012-04-17 21:02:54   

8       2012-12-12 20:17:56   

9       2018-11-08 10:51:49   

11      2013-08-25 07:13:31   

12      2015-01-09 00:54:17   



 title  

0       Italy will not dismantle Montis labour reform  minister                            

1       Exclusive US agency FinCEN rejected veterans in bid to hire lawyers                

4       Xis campaign to draw people back to graying rural China faces uphill battle        

6       Romney begins to win over conservatives                                            

8       Oregon mall shooting survivor in serious condition                                 

9       Polands PGNiG to sign another deal for LNG supplies from US CEO                    

11      Australias opposition leader pledges stronger economy if elected PM                

12      New York shifts into Code Blue to get homeless off frigid streets

python pandas nlp spacy textacy

edited Dec 27 at 13:22

asked Dec 27 at 13:11

W.R

New contributor

edited Dec 27 at 13:22

asked Dec 27 at 13:11

W.R

New contributor

edited Dec 27 at 13:22

asked Dec 27 at 13:11

W.R

New contributor

asked Dec 27 at 13:11

W.R

asked Dec 27 at 13:11

W.R

New contributor

W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13

Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16

Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17

@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21

add a comment |

can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13

Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16

Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17

@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21

can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13

Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16

Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17

@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21

add a comment |

1 Answer
1

active

oldest

votes

This should speed it somewhat -

import spacy

import textacy

nlp = spacy.load('en_core_web_sm')

def extract_SVO(text):

    tuples = textacy.extract.subject_verb_object_triples(text)

    tuples_to_list = list(tuples)

    if tuples:

        tuples_to_list = list(tuples)

        tuples_list.append(tuples_to_list)



tuples_list =           

sp500news['title'] = sp500news['title'].apply(nlp)

_ = sp500news['title'].apply(extract_SVO)

print(tuples_list)

Explanation

In OP imlementation, nlp = spacy.load('en_core_web_sm') is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.

Also, the tuple casting to list can happen only if the tuple is not empty.

edited Dec 27 at 13:30

answered Dec 27 at 13:22

Vivek Kalyanarangan

4,7511826

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

W.R is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945672%2fmore-efficient-implementation-of-textacy-spacy-subject-verb-object-triples%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

This should speed it somewhat -

import spacy

import textacy

nlp = spacy.load('en_core_web_sm')

def extract_SVO(text):

    tuples = textacy.extract.subject_verb_object_triples(text)

    tuples_to_list = list(tuples)

    if tuples:

        tuples_to_list = list(tuples)

        tuples_list.append(tuples_to_list)



tuples_list =           

sp500news['title'] = sp500news['title'].apply(nlp)

_ = sp500news['title'].apply(extract_SVO)

print(tuples_list)

Explanation

Also, the tuple casting to list can happen only if the tuple is not empty.

edited Dec 27 at 13:30

answered Dec 27 at 13:22

Vivek Kalyanarangan

4,7511826

add a comment |

This should speed it somewhat -

import spacy

import textacy

nlp = spacy.load('en_core_web_sm')

def extract_SVO(text):

    tuples = textacy.extract.subject_verb_object_triples(text)

    tuples_to_list = list(tuples)

    if tuples:

        tuples_to_list = list(tuples)

        tuples_list.append(tuples_to_list)



tuples_list =           

sp500news['title'] = sp500news['title'].apply(nlp)

_ = sp500news['title'].apply(extract_SVO)

print(tuples_list)

Explanation

Also, the tuple casting to list can happen only if the tuple is not empty.

edited Dec 27 at 13:30

answered Dec 27 at 13:22

Vivek Kalyanarangan

4,7511826

add a comment |

This should speed it somewhat -

import spacy

import textacy

nlp = spacy.load('en_core_web_sm')

def extract_SVO(text):

    tuples = textacy.extract.subject_verb_object_triples(text)

    tuples_to_list = list(tuples)

    if tuples:

        tuples_to_list = list(tuples)

        tuples_list.append(tuples_to_list)



tuples_list =           

sp500news['title'] = sp500news['title'].apply(nlp)

_ = sp500news['title'].apply(extract_SVO)

print(tuples_list)

Explanation

Also, the tuple casting to list can happen only if the tuple is not empty.

edited Dec 27 at 13:30

answered Dec 27 at 13:22

Vivek Kalyanarangan

4,7511826

This should speed it somewhat -

import spacy

import textacy

nlp = spacy.load('en_core_web_sm')

def extract_SVO(text):

    tuples = textacy.extract.subject_verb_object_triples(text)

    tuples_to_list = list(tuples)

    if tuples:

        tuples_to_list = list(tuples)

        tuples_list.append(tuples_to_list)



tuples_list =           

sp500news['title'] = sp500news['title'].apply(nlp)

_ = sp500news['title'].apply(extract_SVO)

print(tuples_list)

Explanation

Also, the tuple casting to list can happen only if the tuple is not empty.

edited Dec 27 at 13:30

answered Dec 27 at 13:22

Vivek Kalyanarangan

4,7511826

edited Dec 27 at 13:30

answered Dec 27 at 13:22

Vivek Kalyanarangan

4,7511826

answered Dec 27 at 13:22

Vivek Kalyanarangan

4,7511826

answered Dec 27 at 13:22

Vivek Kalyanarangan

4,7511826

add a comment |

W.R is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

W.R is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk

More efficient implementation of Textacy / spacy 'subject_verb_object_triples'

Sample data (sp500news)

Sample data (sp500news)

Sample data (sp500news)

Sample data (sp500news)

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Mossoró

Can't read property showImagePicker of undefined in react native iOS

Pushsharp Apns notification error: 'InvalidToken'

More efficient implementation of Textacy / spacy 'subject_verb_object_triples'

Sample data (sp500news)

Sample data (sp500news)

Sample data (sp500news)

Sample data (sp500news)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Mossoró

Can't read property showImagePicker of undefined in react native iOS

Pushsharp Apns notification error: 'InvalidToken'

1 Answer
1

1 Answer
1

1 Answer
1