Scraping Dropdown prompts
I'm having some issues trying to get data from a dropdown button and none of the answers in the site (or at least the ones y found) help me.
The website i'm trying to scrape is amazon, for example, 'Nike Shoes'.
When I enter a product that falls into 'Nike Shoes', I may get a product like this:
https://www.amazon.com/NIKE-Flex-2017-Running-Shoes/dp/B072LGTJKQ/ref=sr_1_1_sspa?ie=UTF8&qid=1546518735&sr=8-1-spons&keywords=nike+shoes&psc=1
Where the size and the color comes with the page. So scraping is simple.
The problem comes when I get this type of products:
https://www.amazon.com/NIKE-Lebron-Soldier-Mid-Top-Basketball/dp/B07KJJ52S4/ref=sr_1_3?ie=UTF8&qid=1546518445&sr=8-3&keywords=nike+shoes
Where I have to select a size, and maybe a color, and also the price changes if I select different sizes.
My question is, is it there a way to, for example, access every "shoe size" so I can at least check the price for that size?
If the page had some sort of list with the sizes within the source code it wouldn't be that hard, but the page changes when I select the size and no "list" of shoe sizes appears on the source (also the URL doesn't change).
python-3.x xpath web-scraping scrapy
add a comment |
I'm having some issues trying to get data from a dropdown button and none of the answers in the site (or at least the ones y found) help me.
The website i'm trying to scrape is amazon, for example, 'Nike Shoes'.
When I enter a product that falls into 'Nike Shoes', I may get a product like this:
https://www.amazon.com/NIKE-Flex-2017-Running-Shoes/dp/B072LGTJKQ/ref=sr_1_1_sspa?ie=UTF8&qid=1546518735&sr=8-1-spons&keywords=nike+shoes&psc=1
Where the size and the color comes with the page. So scraping is simple.
The problem comes when I get this type of products:
https://www.amazon.com/NIKE-Lebron-Soldier-Mid-Top-Basketball/dp/B07KJJ52S4/ref=sr_1_3?ie=UTF8&qid=1546518445&sr=8-3&keywords=nike+shoes
Where I have to select a size, and maybe a color, and also the price changes if I select different sizes.
My question is, is it there a way to, for example, access every "shoe size" so I can at least check the price for that size?
If the page had some sort of list with the sizes within the source code it wouldn't be that hard, but the page changes when I select the size and no "list" of shoe sizes appears on the source (also the URL doesn't change).
python-3.x xpath web-scraping scrapy
sounds like something you'd do with Selenium
– chitown88
Jan 3 at 13:08
add a comment |
I'm having some issues trying to get data from a dropdown button and none of the answers in the site (or at least the ones y found) help me.
The website i'm trying to scrape is amazon, for example, 'Nike Shoes'.
When I enter a product that falls into 'Nike Shoes', I may get a product like this:
https://www.amazon.com/NIKE-Flex-2017-Running-Shoes/dp/B072LGTJKQ/ref=sr_1_1_sspa?ie=UTF8&qid=1546518735&sr=8-1-spons&keywords=nike+shoes&psc=1
Where the size and the color comes with the page. So scraping is simple.
The problem comes when I get this type of products:
https://www.amazon.com/NIKE-Lebron-Soldier-Mid-Top-Basketball/dp/B07KJJ52S4/ref=sr_1_3?ie=UTF8&qid=1546518445&sr=8-3&keywords=nike+shoes
Where I have to select a size, and maybe a color, and also the price changes if I select different sizes.
My question is, is it there a way to, for example, access every "shoe size" so I can at least check the price for that size?
If the page had some sort of list with the sizes within the source code it wouldn't be that hard, but the page changes when I select the size and no "list" of shoe sizes appears on the source (also the URL doesn't change).
python-3.x xpath web-scraping scrapy
I'm having some issues trying to get data from a dropdown button and none of the answers in the site (or at least the ones y found) help me.
The website i'm trying to scrape is amazon, for example, 'Nike Shoes'.
When I enter a product that falls into 'Nike Shoes', I may get a product like this:
https://www.amazon.com/NIKE-Flex-2017-Running-Shoes/dp/B072LGTJKQ/ref=sr_1_1_sspa?ie=UTF8&qid=1546518735&sr=8-1-spons&keywords=nike+shoes&psc=1
Where the size and the color comes with the page. So scraping is simple.
The problem comes when I get this type of products:
https://www.amazon.com/NIKE-Lebron-Soldier-Mid-Top-Basketball/dp/B07KJJ52S4/ref=sr_1_3?ie=UTF8&qid=1546518445&sr=8-3&keywords=nike+shoes
Where I have to select a size, and maybe a color, and also the price changes if I select different sizes.
My question is, is it there a way to, for example, access every "shoe size" so I can at least check the price for that size?
If the page had some sort of list with the sizes within the source code it wouldn't be that hard, but the page changes when I select the size and no "list" of shoe sizes appears on the source (also the URL doesn't change).
python-3.x xpath web-scraping scrapy
python-3.x xpath web-scraping scrapy
asked Jan 3 at 12:38
ManuelManuel
1068
1068
sounds like something you'd do with Selenium
– chitown88
Jan 3 at 13:08
add a comment |
sounds like something you'd do with Selenium
– chitown88
Jan 3 at 13:08
sounds like something you'd do with Selenium
– chitown88
Jan 3 at 13:08
sounds like something you'd do with Selenium
– chitown88
Jan 3 at 13:08
add a comment |
1 Answer
1
active
oldest
votes
Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.
In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:
Now you need to:
Identify where it json part is:
It usually is somewhere in
<script>
tags or asdata-<something>
attribute of any tag.
Extract json part:
If it's embedded into javascript directly you can clean extract it with regex:
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '({.+?}_')
Load the json as dict and parse the tree, e.g.:
import json
d = json.loads(data[0])
d['products'][0]
Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product
– Manuel
Jan 3 at 14:32
1
Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.
– Granitosaurus
Jan 3 at 14:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54022455%2fscraping-dropdown-prompts%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.
In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:
Now you need to:
Identify where it json part is:
It usually is somewhere in
<script>
tags or asdata-<something>
attribute of any tag.
Extract json part:
If it's embedded into javascript directly you can clean extract it with regex:
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '({.+?}_')
Load the json as dict and parse the tree, e.g.:
import json
d = json.loads(data[0])
d['products'][0]
Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product
– Manuel
Jan 3 at 14:32
1
Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.
– Granitosaurus
Jan 3 at 14:41
add a comment |
Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.
In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:
Now you need to:
Identify where it json part is:
It usually is somewhere in
<script>
tags or asdata-<something>
attribute of any tag.
Extract json part:
If it's embedded into javascript directly you can clean extract it with regex:
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '({.+?}_')
Load the json as dict and parse the tree, e.g.:
import json
d = json.loads(data[0])
d['products'][0]
Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product
– Manuel
Jan 3 at 14:32
1
Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.
– Granitosaurus
Jan 3 at 14:41
add a comment |
Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.
In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:
Now you need to:
Identify where it json part is:
It usually is somewhere in
<script>
tags or asdata-<something>
attribute of any tag.
Extract json part:
If it's embedded into javascript directly you can clean extract it with regex:
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '({.+?}_')
Load the json as dict and parse the tree, e.g.:
import json
d = json.loads(data[0])
d['products'][0]
Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.
In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:
Now you need to:
Identify where it json part is:
It usually is somewhere in
<script>
tags or asdata-<something>
attribute of any tag.
Extract json part:
If it's embedded into javascript directly you can clean extract it with regex:
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '({.+?}_')
Load the json as dict and parse the tree, e.g.:
import json
d = json.loads(data[0])
d['products'][0]
edited Jan 3 at 13:55
answered Jan 3 at 13:46
GranitosaurusGranitosaurus
11.6k22445
11.6k22445
Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product
– Manuel
Jan 3 at 14:32
1
Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.
– Granitosaurus
Jan 3 at 14:41
add a comment |
Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product
– Manuel
Jan 3 at 14:32
1
Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.
– Granitosaurus
Jan 3 at 14:41
Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product
– Manuel
Jan 3 at 14:32
Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product
– Manuel
Jan 3 at 14:32
1
1
Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.
– Granitosaurus
Jan 3 at 14:41
Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.
– Granitosaurus
Jan 3 at 14:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54022455%2fscraping-dropdown-prompts%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
sounds like something you'd do with Selenium
– chitown88
Jan 3 at 13:08