In Python, how do I find elements that contain a specific attribute?












0















I'm using Python 3.7. I want to locate all the elements in my HTML page that have an attribute, "data-permalink", regardless of what its value is, even if the value is empty. However, I'm confused about how to do this. I'm using the bs4 package and tried the following



soup = BeautifulSoup(html)
soup.findAll("data-permalink")

soup.findAll("a")
[<a href=" ... </a>]
soup.findAll("a.data-permalink")



The attribute is normally only found in anchor tags on my page, hence my unsuccessful, "a.data-permalink" attempt. I would like to return the elements that contain the attribute.










share|improve this question


















  • 1





    This might be relevant to your question: stackoverflow.com/questions/31416858/…

    – Jason Baumgartner
    Jan 1 at 23:31











  • Thanks. Do you know if there's any way to make their example more generic? They have "soup.find_all("div", attrs={"limit":True})" and I was wondering if there is a way to substitute "True" for some kind of expression that means match anything.

    – Dave
    Jan 1 at 23:55






  • 2





    True will match anything. If you prefer css-selectors: soup.select('a[data-permalink]').

    – t.m.adam
    Jan 2 at 0:39


















0















I'm using Python 3.7. I want to locate all the elements in my HTML page that have an attribute, "data-permalink", regardless of what its value is, even if the value is empty. However, I'm confused about how to do this. I'm using the bs4 package and tried the following



soup = BeautifulSoup(html)
soup.findAll("data-permalink")

soup.findAll("a")
[<a href=" ... </a>]
soup.findAll("a.data-permalink")



The attribute is normally only found in anchor tags on my page, hence my unsuccessful, "a.data-permalink" attempt. I would like to return the elements that contain the attribute.










share|improve this question


















  • 1





    This might be relevant to your question: stackoverflow.com/questions/31416858/…

    – Jason Baumgartner
    Jan 1 at 23:31











  • Thanks. Do you know if there's any way to make their example more generic? They have "soup.find_all("div", attrs={"limit":True})" and I was wondering if there is a way to substitute "True" for some kind of expression that means match anything.

    – Dave
    Jan 1 at 23:55






  • 2





    True will match anything. If you prefer css-selectors: soup.select('a[data-permalink]').

    – t.m.adam
    Jan 2 at 0:39
















0












0








0








I'm using Python 3.7. I want to locate all the elements in my HTML page that have an attribute, "data-permalink", regardless of what its value is, even if the value is empty. However, I'm confused about how to do this. I'm using the bs4 package and tried the following



soup = BeautifulSoup(html)
soup.findAll("data-permalink")

soup.findAll("a")
[<a href=" ... </a>]
soup.findAll("a.data-permalink")



The attribute is normally only found in anchor tags on my page, hence my unsuccessful, "a.data-permalink" attempt. I would like to return the elements that contain the attribute.










share|improve this question














I'm using Python 3.7. I want to locate all the elements in my HTML page that have an attribute, "data-permalink", regardless of what its value is, even if the value is empty. However, I'm confused about how to do this. I'm using the bs4 package and tried the following



soup = BeautifulSoup(html)
soup.findAll("data-permalink")

soup.findAll("a")
[<a href=" ... </a>]
soup.findAll("a.data-permalink")



The attribute is normally only found in anchor tags on my page, hence my unsuccessful, "a.data-permalink" attempt. I would like to return the elements that contain the attribute.







python python-3.x beautifulsoup html-parsing






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 1 at 23:26









DaveDave

1,08870204364




1,08870204364








  • 1





    This might be relevant to your question: stackoverflow.com/questions/31416858/…

    – Jason Baumgartner
    Jan 1 at 23:31











  • Thanks. Do you know if there's any way to make their example more generic? They have "soup.find_all("div", attrs={"limit":True})" and I was wondering if there is a way to substitute "True" for some kind of expression that means match anything.

    – Dave
    Jan 1 at 23:55






  • 2





    True will match anything. If you prefer css-selectors: soup.select('a[data-permalink]').

    – t.m.adam
    Jan 2 at 0:39
















  • 1





    This might be relevant to your question: stackoverflow.com/questions/31416858/…

    – Jason Baumgartner
    Jan 1 at 23:31











  • Thanks. Do you know if there's any way to make their example more generic? They have "soup.find_all("div", attrs={"limit":True})" and I was wondering if there is a way to substitute "True" for some kind of expression that means match anything.

    – Dave
    Jan 1 at 23:55






  • 2





    True will match anything. If you prefer css-selectors: soup.select('a[data-permalink]').

    – t.m.adam
    Jan 2 at 0:39










1




1





This might be relevant to your question: stackoverflow.com/questions/31416858/…

– Jason Baumgartner
Jan 1 at 23:31





This might be relevant to your question: stackoverflow.com/questions/31416858/…

– Jason Baumgartner
Jan 1 at 23:31













Thanks. Do you know if there's any way to make their example more generic? They have "soup.find_all("div", attrs={"limit":True})" and I was wondering if there is a way to substitute "True" for some kind of expression that means match anything.

– Dave
Jan 1 at 23:55





Thanks. Do you know if there's any way to make their example more generic? They have "soup.find_all("div", attrs={"limit":True})" and I was wondering if there is a way to substitute "True" for some kind of expression that means match anything.

– Dave
Jan 1 at 23:55




2




2





True will match anything. If you prefer css-selectors: soup.select('a[data-permalink]').

– t.m.adam
Jan 2 at 0:39







True will match anything. If you prefer css-selectors: soup.select('a[data-permalink]').

– t.m.adam
Jan 2 at 0:39














1 Answer
1






active

oldest

votes


















0














Your selector is invalid



soup.findAll("a.data-permalink")


it should be used for the method .select() but still it invalid because it mean select <a> with the class not the attribute.



to match everything use the * for select()



.select('*[data-permalink]')


or True if using findAll()



.findAll(True, attrs={'data-permalink' : True})


example



from bs4 import BeautifulSoup

html = '''<a data-permalink="a">link</a>
<b>bold</b>
<i data-permalink="i">italic</i>'''

soup= BeautifulSoup(html, 'html.parser')
permalink = soup.select('*[data-permalink]')
# or
# permalink = soup.findAll(True, attrs={'data-permalink' : True})
print(permalink)


Results, the <b> element is skipped



[<a data-permalink="a">link</a>, <i data-permalink="i">italic</i>]





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53999743%2fin-python-how-do-i-find-elements-that-contain-a-specific-attribute%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Your selector is invalid



    soup.findAll("a.data-permalink")


    it should be used for the method .select() but still it invalid because it mean select <a> with the class not the attribute.



    to match everything use the * for select()



    .select('*[data-permalink]')


    or True if using findAll()



    .findAll(True, attrs={'data-permalink' : True})


    example



    from bs4 import BeautifulSoup

    html = '''<a data-permalink="a">link</a>
    <b>bold</b>
    <i data-permalink="i">italic</i>'''

    soup= BeautifulSoup(html, 'html.parser')
    permalink = soup.select('*[data-permalink]')
    # or
    # permalink = soup.findAll(True, attrs={'data-permalink' : True})
    print(permalink)


    Results, the <b> element is skipped



    [<a data-permalink="a">link</a>, <i data-permalink="i">italic</i>]





    share|improve this answer




























      0














      Your selector is invalid



      soup.findAll("a.data-permalink")


      it should be used for the method .select() but still it invalid because it mean select <a> with the class not the attribute.



      to match everything use the * for select()



      .select('*[data-permalink]')


      or True if using findAll()



      .findAll(True, attrs={'data-permalink' : True})


      example



      from bs4 import BeautifulSoup

      html = '''<a data-permalink="a">link</a>
      <b>bold</b>
      <i data-permalink="i">italic</i>'''

      soup= BeautifulSoup(html, 'html.parser')
      permalink = soup.select('*[data-permalink]')
      # or
      # permalink = soup.findAll(True, attrs={'data-permalink' : True})
      print(permalink)


      Results, the <b> element is skipped



      [<a data-permalink="a">link</a>, <i data-permalink="i">italic</i>]





      share|improve this answer


























        0












        0








        0







        Your selector is invalid



        soup.findAll("a.data-permalink")


        it should be used for the method .select() but still it invalid because it mean select <a> with the class not the attribute.



        to match everything use the * for select()



        .select('*[data-permalink]')


        or True if using findAll()



        .findAll(True, attrs={'data-permalink' : True})


        example



        from bs4 import BeautifulSoup

        html = '''<a data-permalink="a">link</a>
        <b>bold</b>
        <i data-permalink="i">italic</i>'''

        soup= BeautifulSoup(html, 'html.parser')
        permalink = soup.select('*[data-permalink]')
        # or
        # permalink = soup.findAll(True, attrs={'data-permalink' : True})
        print(permalink)


        Results, the <b> element is skipped



        [<a data-permalink="a">link</a>, <i data-permalink="i">italic</i>]





        share|improve this answer













        Your selector is invalid



        soup.findAll("a.data-permalink")


        it should be used for the method .select() but still it invalid because it mean select <a> with the class not the attribute.



        to match everything use the * for select()



        .select('*[data-permalink]')


        or True if using findAll()



        .findAll(True, attrs={'data-permalink' : True})


        example



        from bs4 import BeautifulSoup

        html = '''<a data-permalink="a">link</a>
        <b>bold</b>
        <i data-permalink="i">italic</i>'''

        soup= BeautifulSoup(html, 'html.parser')
        permalink = soup.select('*[data-permalink]')
        # or
        # permalink = soup.findAll(True, attrs={'data-permalink' : True})
        print(permalink)


        Results, the <b> element is skipped



        [<a data-permalink="a">link</a>, <i data-permalink="i">italic</i>]






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 2 at 6:34









        ewwinkewwink

        12k22339




        12k22339
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53999743%2fin-python-how-do-i-find-elements-that-contain-a-specific-attribute%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas