More efficient implementation of Textacy / spacy 'subject_verb_object_triples'












0














I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?



import spacy
import textacy

def extract_SVO(text):

nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)

tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)


Sample data (sp500news)



    date_publish  
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17

title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets









share|improve this question









New contributor




W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • can you please provide some sample data?
    – Vivek Kalyanarangan
    Dec 27 at 13:13










  • Hi @VivekKalyanarangan, I've added the sample data
    – W.R
    Dec 27 at 13:16










  • Can you copy paste and format as code? Its easier than viewing and typing from the image
    – Vivek Kalyanarangan
    Dec 27 at 13:17










  • @VivekKalyanarangan -- done
    – W.R
    Dec 27 at 13:21
















0














I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?



import spacy
import textacy

def extract_SVO(text):

nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)

tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)


Sample data (sp500news)



    date_publish  
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17

title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets









share|improve this question









New contributor




W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • can you please provide some sample data?
    – Vivek Kalyanarangan
    Dec 27 at 13:13










  • Hi @VivekKalyanarangan, I've added the sample data
    – W.R
    Dec 27 at 13:16










  • Can you copy paste and format as code? Its easier than viewing and typing from the image
    – Vivek Kalyanarangan
    Dec 27 at 13:17










  • @VivekKalyanarangan -- done
    – W.R
    Dec 27 at 13:21














0












0








0







I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?



import spacy
import textacy

def extract_SVO(text):

nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)

tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)


Sample data (sp500news)



    date_publish  
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17

title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets









share|improve this question









New contributor




W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I'm trying to implement the 'extract.subject_verb_object_triples' funcation from textacy on my dataset. However, the code I have written is very slow and memory intensive. Is there a more efficient implementation?



import spacy
import textacy

def extract_SVO(text):

nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != :
tuples_list.append(tuples_to_list)

tuples_list =
sp500news['title'].apply(extract_SVO)
print(tuples_list)


Sample data (sp500news)



    date_publish  
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17

title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets






python pandas nlp spacy textacy






share|improve this question









New contributor




W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited Dec 27 at 13:22





















New contributor




W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Dec 27 at 13:11









W.R

32




32




New contributor




W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






W.R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • can you please provide some sample data?
    – Vivek Kalyanarangan
    Dec 27 at 13:13










  • Hi @VivekKalyanarangan, I've added the sample data
    – W.R
    Dec 27 at 13:16










  • Can you copy paste and format as code? Its easier than viewing and typing from the image
    – Vivek Kalyanarangan
    Dec 27 at 13:17










  • @VivekKalyanarangan -- done
    – W.R
    Dec 27 at 13:21


















  • can you please provide some sample data?
    – Vivek Kalyanarangan
    Dec 27 at 13:13










  • Hi @VivekKalyanarangan, I've added the sample data
    – W.R
    Dec 27 at 13:16










  • Can you copy paste and format as code? Its easier than viewing and typing from the image
    – Vivek Kalyanarangan
    Dec 27 at 13:17










  • @VivekKalyanarangan -- done
    – W.R
    Dec 27 at 13:21
















can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13




can you please provide some sample data?
– Vivek Kalyanarangan
Dec 27 at 13:13












Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16




Hi @VivekKalyanarangan, I've added the sample data
– W.R
Dec 27 at 13:16












Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17




Can you copy paste and format as code? Its easier than viewing and typing from the image
– Vivek Kalyanarangan
Dec 27 at 13:17












@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21




@VivekKalyanarangan -- done
– W.R
Dec 27 at 13:21












1 Answer
1






active

oldest

votes


















0














This should speed it somewhat -



import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
tuples_to_list = list(tuples)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)

tuples_list =
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)


Explanation



In OP imlementation, nlp = spacy.load('en_core_web_sm') is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.



Also, the tuple casting to list can happen only if the tuple is not empty.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    W.R is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945672%2fmore-efficient-implementation-of-textacy-spacy-subject-verb-object-triples%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    This should speed it somewhat -



    import spacy
    import textacy
    nlp = spacy.load('en_core_web_sm')
    def extract_SVO(text):
    tuples = textacy.extract.subject_verb_object_triples(text)
    tuples_to_list = list(tuples)
    if tuples:
    tuples_to_list = list(tuples)
    tuples_list.append(tuples_to_list)

    tuples_list =
    sp500news['title'] = sp500news['title'].apply(nlp)
    _ = sp500news['title'].apply(extract_SVO)
    print(tuples_list)


    Explanation



    In OP imlementation, nlp = spacy.load('en_core_web_sm') is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.



    Also, the tuple casting to list can happen only if the tuple is not empty.






    share|improve this answer




























      0














      This should speed it somewhat -



      import spacy
      import textacy
      nlp = spacy.load('en_core_web_sm')
      def extract_SVO(text):
      tuples = textacy.extract.subject_verb_object_triples(text)
      tuples_to_list = list(tuples)
      if tuples:
      tuples_to_list = list(tuples)
      tuples_list.append(tuples_to_list)

      tuples_list =
      sp500news['title'] = sp500news['title'].apply(nlp)
      _ = sp500news['title'].apply(extract_SVO)
      print(tuples_list)


      Explanation



      In OP imlementation, nlp = spacy.load('en_core_web_sm') is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.



      Also, the tuple casting to list can happen only if the tuple is not empty.






      share|improve this answer


























        0












        0








        0






        This should speed it somewhat -



        import spacy
        import textacy
        nlp = spacy.load('en_core_web_sm')
        def extract_SVO(text):
        tuples = textacy.extract.subject_verb_object_triples(text)
        tuples_to_list = list(tuples)
        if tuples:
        tuples_to_list = list(tuples)
        tuples_list.append(tuples_to_list)

        tuples_list =
        sp500news['title'] = sp500news['title'].apply(nlp)
        _ = sp500news['title'].apply(extract_SVO)
        print(tuples_list)


        Explanation



        In OP imlementation, nlp = spacy.load('en_core_web_sm') is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.



        Also, the tuple casting to list can happen only if the tuple is not empty.






        share|improve this answer














        This should speed it somewhat -



        import spacy
        import textacy
        nlp = spacy.load('en_core_web_sm')
        def extract_SVO(text):
        tuples = textacy.extract.subject_verb_object_triples(text)
        tuples_to_list = list(tuples)
        if tuples:
        tuples_to_list = list(tuples)
        tuples_list.append(tuples_to_list)

        tuples_list =
        sp500news['title'] = sp500news['title'].apply(nlp)
        _ = sp500news['title'].apply(extract_SVO)
        print(tuples_list)


        Explanation



        In OP imlementation, nlp = spacy.load('en_core_web_sm') is called so from inside the function it loads everytime. I sense this is the biggest bottleneck. This can be taken out and it should speed it up.



        Also, the tuple casting to list can happen only if the tuple is not empty.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Dec 27 at 13:30

























        answered Dec 27 at 13:22









        Vivek Kalyanarangan

        4,7511826




        4,7511826






















            W.R is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            W.R is a new contributor. Be nice, and check out our Code of Conduct.













            W.R is a new contributor. Be nice, and check out our Code of Conduct.












            W.R is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945672%2fmore-efficient-implementation-of-textacy-spacy-subject-verb-object-triples%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas