Request Returns Response 447

I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?

Code:

r = requests.get('https://foobar)

soup = BeautifulSoup(r.text, 'html.parser')

print(soup.get_text())



Output:

''

When I print request object:

print(r)



Output:

<Response [447]>

asked Dec 31 '18 at 3:15

ElectroMotiveHorse

637

I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

– chitown88
Dec 31 '18 at 13:03

add a comment |

Code:

r = requests.get('https://foobar)

soup = BeautifulSoup(r.text, 'html.parser')

print(soup.get_text())



Output:

''

When I print request object:

print(r)



Output:

<Response [447]>

asked Dec 31 '18 at 3:15

ElectroMotiveHorse

637

I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

– chitown88
Dec 31 '18 at 13:03

add a comment |

Code:

r = requests.get('https://foobar)

soup = BeautifulSoup(r.text, 'html.parser')

print(soup.get_text())



Output:

''

When I print request object:

print(r)



Output:

<Response [447]>

asked Dec 31 '18 at 3:15

ElectroMotiveHorse

637

Code:

r = requests.get('https://foobar)

soup = BeautifulSoup(r.text, 'html.parser')

print(soup.get_text())



Output:

''

When I print request object:

print(r)



Output:

<Response [447]>

python-3.x http web-scraping beautifulsoup request

asked Dec 31 '18 at 3:15

ElectroMotiveHorse

637

asked Dec 31 '18 at 3:15

ElectroMotiveHorse

637

asked Dec 31 '18 at 3:15

ElectroMotiveHorse

637

asked Dec 31 '18 at 3:15

ElectroMotiveHorse

637

asked Dec 31 '18 at 3:15

ElectroMotiveHorse

637

I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

– chitown88
Dec 31 '18 at 13:03

add a comment |

I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

– chitown88
Dec 31 '18 at 13:03

I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.

– chitown88
Dec 31 '18 at 13:03

add a comment |

2 Answers
2

active

oldest

votes

Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.

import bs4

import requests

session=requests.session()

headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}

req=session.get(url,headers=headers)

soup=bs4.BeautifulSoup(req.text)

answered Dec 31 '18 at 21:00

timmy

1148

timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

– ElectroMotiveHorse
Jan 1 at 5:46

2

whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

– timmy
Jan 1 at 10:44

timmy Makes sense. I will include headers in future scraping. Thanks for your help.

– ElectroMotiveHorse
Jan 1 at 18:32

add a comment |

Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)

While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.

Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.

edited Dec 31 '18 at 20:45

answered Dec 31 '18 at 19:37

B.Adler

925916

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53983250%2frequest-returns-response-447%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.

import bs4

import requests

session=requests.session()

headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}

req=session.get(url,headers=headers)

soup=bs4.BeautifulSoup(req.text)

answered Dec 31 '18 at 21:00

timmy

1148

timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

– ElectroMotiveHorse
Jan 1 at 5:46

2

whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

– timmy
Jan 1 at 10:44

timmy Makes sense. I will include headers in future scraping. Thanks for your help.

– ElectroMotiveHorse
Jan 1 at 18:32

add a comment |

Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.

import bs4

import requests

session=requests.session()

headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}

req=session.get(url,headers=headers)

soup=bs4.BeautifulSoup(req.text)

answered Dec 31 '18 at 21:00

timmy

1148

timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

– ElectroMotiveHorse
Jan 1 at 5:46

2

whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

– timmy
Jan 1 at 10:44

timmy Makes sense. I will include headers in future scraping. Thanks for your help.

– ElectroMotiveHorse
Jan 1 at 18:32

add a comment |

Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.

import bs4

import requests

session=requests.session()

headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}

req=session.get(url,headers=headers)

soup=bs4.BeautifulSoup(req.text)

answered Dec 31 '18 at 21:00

timmy

1148

Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.

import bs4

import requests

session=requests.session()

headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}

req=session.get(url,headers=headers)

soup=bs4.BeautifulSoup(req.text)

answered Dec 31 '18 at 21:00

timmy

1148

answered Dec 31 '18 at 21:00

timmy

1148

answered Dec 31 '18 at 21:00

timmy

1148

answered Dec 31 '18 at 21:00

timmy

1148

timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

– ElectroMotiveHorse
Jan 1 at 5:46

2

whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

– timmy
Jan 1 at 10:44

timmy Makes sense. I will include headers in future scraping. Thanks for your help.

– ElectroMotiveHorse
Jan 1 at 18:32

add a comment |

timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

– ElectroMotiveHorse
Jan 1 at 5:46

2

whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

– timmy
Jan 1 at 10:44

timmy Makes sense. I will include headers in future scraping. Thanks for your help.

– ElectroMotiveHorse
Jan 1 at 18:32

timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?

– ElectroMotiveHorse
Jan 1 at 5:46

whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers

– timmy
Jan 1 at 10:44

timmy Makes sense. I will include headers in future scraping. Thanks for your help.

– ElectroMotiveHorse
Jan 1 at 18:32

add a comment |

Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)

While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.

Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.

edited Dec 31 '18 at 20:45

answered Dec 31 '18 at 19:37

B.Adler

925916

add a comment |

Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)

While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.

Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.

edited Dec 31 '18 at 20:45

answered Dec 31 '18 at 19:37

B.Adler

925916

add a comment |

Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)

While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.

Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.

edited Dec 31 '18 at 20:45

answered Dec 31 '18 at 19:37

B.Adler

925916

Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)

While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.

Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.

edited Dec 31 '18 at 20:45

answered Dec 31 '18 at 19:37

B.Adler

925916

edited Dec 31 '18 at 20:45

answered Dec 31 '18 at 19:37

B.Adler

925916

answered Dec 31 '18 at 19:37

B.Adler

925916

answered Dec 31 '18 at 19:37

B.Adler

925916

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk