Request Returns Response 447
I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?
Code:
r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())
Output:
''
When I print request object:
print(r)
Output:
<Response [447]>
python-3.x http web-scraping beautifulsoup request
add a comment |
I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?
Code:
r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())
Output:
''
When I print request object:
print(r)
Output:
<Response [447]>
python-3.x http web-scraping beautifulsoup request
I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.
– chitown88
Dec 31 '18 at 13:03
add a comment |
I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?
Code:
r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())
Output:
''
When I print request object:
print(r)
Output:
<Response [447]>
python-3.x http web-scraping beautifulsoup request
I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?
Code:
r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())
Output:
''
When I print request object:
print(r)
Output:
<Response [447]>
python-3.x http web-scraping beautifulsoup request
python-3.x http web-scraping beautifulsoup request
asked Dec 31 '18 at 3:15
ElectroMotiveHorseElectroMotiveHorse
637
637
I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.
– chitown88
Dec 31 '18 at 13:03
add a comment |
I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.
– chitown88
Dec 31 '18 at 13:03
I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.
– chitown88
Dec 31 '18 at 13:03
I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.
– chitown88
Dec 31 '18 at 13:03
add a comment |
2 Answers
2
active
oldest
votes
Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.
import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)
timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?
– ElectroMotiveHorse
Jan 1 at 5:46
2
whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version ofurllibyou have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers
– timmy
Jan 1 at 10:44
timmy Makes sense. I will include headers in future scraping. Thanks for your help.
– ElectroMotiveHorse
Jan 1 at 18:32
add a comment |
Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)
While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.
Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53983250%2frequest-returns-response-447%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.
import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)
timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?
– ElectroMotiveHorse
Jan 1 at 5:46
2
whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version ofurllibyou have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers
– timmy
Jan 1 at 10:44
timmy Makes sense. I will include headers in future scraping. Thanks for your help.
– ElectroMotiveHorse
Jan 1 at 18:32
add a comment |
Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.
import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)
timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?
– ElectroMotiveHorse
Jan 1 at 5:46
2
whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version ofurllibyou have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers
– timmy
Jan 1 at 10:44
timmy Makes sense. I will include headers in future scraping. Thanks for your help.
– ElectroMotiveHorse
Jan 1 at 18:32
add a comment |
Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.
import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)
Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.
import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)
answered Dec 31 '18 at 21:00
timmytimmy
1148
1148
timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?
– ElectroMotiveHorse
Jan 1 at 5:46
2
whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version ofurllibyou have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers
– timmy
Jan 1 at 10:44
timmy Makes sense. I will include headers in future scraping. Thanks for your help.
– ElectroMotiveHorse
Jan 1 at 18:32
add a comment |
timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?
– ElectroMotiveHorse
Jan 1 at 5:46
2
whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version ofurllibyou have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers
– timmy
Jan 1 at 10:44
timmy Makes sense. I will include headers in future scraping. Thanks for your help.
– ElectroMotiveHorse
Jan 1 at 18:32
timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?
– ElectroMotiveHorse
Jan 1 at 5:46
timmy your solution worked like a charm. Can you further explain how headers allow someone to by pass the block?
– ElectroMotiveHorse
Jan 1 at 5:46
2
2
whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of
urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers– timmy
Jan 1 at 10:44
whenever you open a site using python or normal browsing ,certain headers are sent to the site ,the most common headers are "User -Agent","connection" and "Accept" ,when you open the site using python to get the page source your User-Agent is "Python-urllib/3.4" (or what ever version of
urllib you have) ,this makes it easy for the site to detect bots or spiders hence ban them or ban their IP address. While when you open site in normal browsing your User-Agent is more human-like such as Mozilla or Chrome or whatever ,requests library allows you to change you headers– timmy
Jan 1 at 10:44
timmy Makes sense. I will include headers in future scraping. Thanks for your help.
– ElectroMotiveHorse
Jan 1 at 18:32
timmy Makes sense. I will include headers in future scraping. Thanks for your help.
– ElectroMotiveHorse
Jan 1 at 18:32
add a comment |
Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)
While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.
Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.
add a comment |
Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)
While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.
Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.
add a comment |
Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)
While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.
Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.
Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)
While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.
Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.
edited Dec 31 '18 at 20:45
answered Dec 31 '18 at 19:37
B.AdlerB.Adler
925916
925916
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53983250%2frequest-returns-response-447%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I would search the particular website to see if there's documentation somewhere that lists their response codes. Also it's tough to try to answer your question as how to successfully scrape a site which you have not provided (what's the site?) All anyone can do is answer how you'd generally do it, which your code should in normal circumstances work.
– chitown88
Dec 31 '18 at 13:03