Request Returns Response 447












0















I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?



Code:



r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())

Output:
''


When I print request object:



print(r)

Output:
<Response [447]>









share|improve this question























  • I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

    – chitown88
    Dec 31 '18 at 13:03
















0















I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?



Code:



r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())

Output:
''


When I print request object:



print(r)

Output:
<Response [447]>









share|improve this question























  • I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

    – chitown88
    Dec 31 '18 at 13:03














0












0








0








I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?



Code:



r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())

Output:
''


When I print request object:



print(r)

Output:
<Response [447]>









share|improve this question














I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?



Code:



r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())

Output:
''


When I print request object:



print(r)

Output:
<Response [447]>






python-3.x http web-scraping beautifulsoup request






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 31 '18 at 3:15









ElectroMotiveHorseElectroMotiveHorse

637




637













  • I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

    – chitown88
    Dec 31 '18 at 13:03



















  • I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

    – chitown88
    Dec 31 '18 at 13:03

















I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

– chitown88
Dec 31 '18 at 13:03





I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

– chitown88
Dec 31 '18 at 13:03












2 Answers
2






active

oldest

votes


















2














Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.



import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)





share|improve this answer
























  • timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

    – ElectroMotiveHorse
    Jan 1 at 5:46








  • 2





    whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

    – timmy
    Jan 1 at 10:44











  • timmy Makes sense. I will include headers in future scraping. Thanks for your help.

    – ElectroMotiveHorse
    Jan 1 at 18:32



















1














Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)



While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.



Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53983250%2frequest-returns-response-447%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.



    import bs4
    import requests
    session=requests.session()
    headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
    req=session.get(url,headers=headers)
    soup=bs4.BeautifulSoup(req.text)





    share|improve this answer
























    • timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

      – ElectroMotiveHorse
      Jan 1 at 5:46








    • 2





      whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

      – timmy
      Jan 1 at 10:44











    • timmy Makes sense. I will include headers in future scraping. Thanks for your help.

      – ElectroMotiveHorse
      Jan 1 at 18:32
















    2














    Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.



    import bs4
    import requests
    session=requests.session()
    headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
    req=session.get(url,headers=headers)
    soup=bs4.BeautifulSoup(req.text)





    share|improve this answer
























    • timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

      – ElectroMotiveHorse
      Jan 1 at 5:46








    • 2





      whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

      – timmy
      Jan 1 at 10:44











    • timmy Makes sense. I will include headers in future scraping. Thanks for your help.

      – ElectroMotiveHorse
      Jan 1 at 18:32














    2












    2








    2







    Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.



    import bs4
    import requests
    session=requests.session()
    headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
    req=session.get(url,headers=headers)
    soup=bs4.BeautifulSoup(req.text)





    share|improve this answer













    Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.



    import bs4
    import requests
    session=requests.session()
    headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
    req=session.get(url,headers=headers)
    soup=bs4.BeautifulSoup(req.text)






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Dec 31 '18 at 21:00









    timmytimmy

    1148




    1148













    • timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

      – ElectroMotiveHorse
      Jan 1 at 5:46








    • 2





      whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

      – timmy
      Jan 1 at 10:44











    • timmy Makes sense. I will include headers in future scraping. Thanks for your help.

      – ElectroMotiveHorse
      Jan 1 at 18:32



















    • timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

      – ElectroMotiveHorse
      Jan 1 at 5:46








    • 2





      whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

      – timmy
      Jan 1 at 10:44











    • timmy Makes sense. I will include headers in future scraping. Thanks for your help.

      – ElectroMotiveHorse
      Jan 1 at 18:32

















    timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

    – ElectroMotiveHorse
    Jan 1 at 5:46







    timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

    – ElectroMotiveHorse
    Jan 1 at 5:46






    2




    2





    whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

    – timmy
    Jan 1 at 10:44





    whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

    – timmy
    Jan 1 at 10:44













    timmy Makes sense. I will include headers in future scraping. Thanks for your help.

    – ElectroMotiveHorse
    Jan 1 at 18:32





    timmy Makes sense. I will include headers in future scraping. Thanks for your help.

    – ElectroMotiveHorse
    Jan 1 at 18:32













    1














    Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)



    While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.



    Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.






    share|improve this answer






























      1














      Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)



      While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.



      Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.






      share|improve this answer




























        1












        1








        1







        Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)



        While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.



        Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.






        share|improve this answer















        Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)



        While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.



        Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Dec 31 '18 at 20:45

























        answered Dec 31 '18 at 19:37









        B.AdlerB.Adler

        925916




        925916






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53983250%2frequest-returns-response-447%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Mossoró

            Error while reading .h5 file using the rhdf5 package in R

            Pushsharp Apns notification error: 'InvalidToken'