Scraping Dropdown prompts












0















I'm having some issues trying to get data from a dropdown button and none of the answers in the site (or at least the ones y found) help me.



The website i'm trying to scrape is amazon, for example, 'Nike Shoes'.



When I enter a product that falls into 'Nike Shoes', I may get a product like this:



https://www.amazon.com/NIKE-Flex-2017-Running-Shoes/dp/B072LGTJKQ/ref=sr_1_1_sspa?ie=UTF8&qid=1546518735&sr=8-1-spons&keywords=nike+shoes&psc=1



Where the size and the color comes with the page. So scraping is simple.



The problem comes when I get this type of products:



https://www.amazon.com/NIKE-Lebron-Soldier-Mid-Top-Basketball/dp/B07KJJ52S4/ref=sr_1_3?ie=UTF8&qid=1546518445&sr=8-3&keywords=nike+shoes



Where I have to select a size, and maybe a color, and also the price changes if I select different sizes.



My question is, is it there a way to, for example, access every "shoe size" so I can at least check the price for that size?



If the page had some sort of list with the sizes within the source code it wouldn't be that hard, but the page changes when I select the size and no "list" of shoe sizes appears on the source (also the URL doesn't change).










share|improve this question























  • sounds like something you'd do with Selenium

    – chitown88
    Jan 3 at 13:08
















0















I'm having some issues trying to get data from a dropdown button and none of the answers in the site (or at least the ones y found) help me.



The website i'm trying to scrape is amazon, for example, 'Nike Shoes'.



When I enter a product that falls into 'Nike Shoes', I may get a product like this:



https://www.amazon.com/NIKE-Flex-2017-Running-Shoes/dp/B072LGTJKQ/ref=sr_1_1_sspa?ie=UTF8&qid=1546518735&sr=8-1-spons&keywords=nike+shoes&psc=1



Where the size and the color comes with the page. So scraping is simple.



The problem comes when I get this type of products:



https://www.amazon.com/NIKE-Lebron-Soldier-Mid-Top-Basketball/dp/B07KJJ52S4/ref=sr_1_3?ie=UTF8&qid=1546518445&sr=8-3&keywords=nike+shoes



Where I have to select a size, and maybe a color, and also the price changes if I select different sizes.



My question is, is it there a way to, for example, access every "shoe size" so I can at least check the price for that size?



If the page had some sort of list with the sizes within the source code it wouldn't be that hard, but the page changes when I select the size and no "list" of shoe sizes appears on the source (also the URL doesn't change).










share|improve this question























  • sounds like something you'd do with Selenium

    – chitown88
    Jan 3 at 13:08














0












0








0


1






I'm having some issues trying to get data from a dropdown button and none of the answers in the site (or at least the ones y found) help me.



The website i'm trying to scrape is amazon, for example, 'Nike Shoes'.



When I enter a product that falls into 'Nike Shoes', I may get a product like this:



https://www.amazon.com/NIKE-Flex-2017-Running-Shoes/dp/B072LGTJKQ/ref=sr_1_1_sspa?ie=UTF8&qid=1546518735&sr=8-1-spons&keywords=nike+shoes&psc=1



Where the size and the color comes with the page. So scraping is simple.



The problem comes when I get this type of products:



https://www.amazon.com/NIKE-Lebron-Soldier-Mid-Top-Basketball/dp/B07KJJ52S4/ref=sr_1_3?ie=UTF8&qid=1546518445&sr=8-3&keywords=nike+shoes



Where I have to select a size, and maybe a color, and also the price changes if I select different sizes.



My question is, is it there a way to, for example, access every "shoe size" so I can at least check the price for that size?



If the page had some sort of list with the sizes within the source code it wouldn't be that hard, but the page changes when I select the size and no "list" of shoe sizes appears on the source (also the URL doesn't change).










share|improve this question














I'm having some issues trying to get data from a dropdown button and none of the answers in the site (or at least the ones y found) help me.



The website i'm trying to scrape is amazon, for example, 'Nike Shoes'.



When I enter a product that falls into 'Nike Shoes', I may get a product like this:



https://www.amazon.com/NIKE-Flex-2017-Running-Shoes/dp/B072LGTJKQ/ref=sr_1_1_sspa?ie=UTF8&qid=1546518735&sr=8-1-spons&keywords=nike+shoes&psc=1



Where the size and the color comes with the page. So scraping is simple.



The problem comes when I get this type of products:



https://www.amazon.com/NIKE-Lebron-Soldier-Mid-Top-Basketball/dp/B07KJJ52S4/ref=sr_1_3?ie=UTF8&qid=1546518445&sr=8-3&keywords=nike+shoes



Where I have to select a size, and maybe a color, and also the price changes if I select different sizes.



My question is, is it there a way to, for example, access every "shoe size" so I can at least check the price for that size?



If the page had some sort of list with the sizes within the source code it wouldn't be that hard, but the page changes when I select the size and no "list" of shoe sizes appears on the source (also the URL doesn't change).







python-3.x xpath web-scraping scrapy






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 3 at 12:38









ManuelManuel

1068




1068













  • sounds like something you'd do with Selenium

    – chitown88
    Jan 3 at 13:08



















  • sounds like something you'd do with Selenium

    – chitown88
    Jan 3 at 13:08

















sounds like something you'd do with Selenium

– chitown88
Jan 3 at 13:08





sounds like something you'd do with Selenium

– chitown88
Jan 3 at 13:08












1 Answer
1






active

oldest

votes


















2














Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.



In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:



enter image description here



Now you need to:





  1. Identify where it json part is:



    It usually is somewhere in <script> tags or as data-<something> attribute of any tag.




  2. Extract json part:



    If it's embedded into javascript directly you can clean extract it with regex:



    script = response.xpath('//script/text()').extract_frist()
    import re
    # capture everything between {}
    data = re.findall(script, '({.+?}_')



  3. Load the json as dict and parse the tree, e.g.:



    import json
    d = json.loads(data[0])
    d['products'][0]







share|improve this answer


























  • Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product

    – Manuel
    Jan 3 at 14:32








  • 1





    Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.

    – Granitosaurus
    Jan 3 at 14:41












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54022455%2fscraping-dropdown-prompts%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.



In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:



enter image description here



Now you need to:





  1. Identify where it json part is:



    It usually is somewhere in <script> tags or as data-<something> attribute of any tag.




  2. Extract json part:



    If it's embedded into javascript directly you can clean extract it with regex:



    script = response.xpath('//script/text()').extract_frist()
    import re
    # capture everything between {}
    data = re.findall(script, '({.+?}_')



  3. Load the json as dict and parse the tree, e.g.:



    import json
    d = json.loads(data[0])
    d['products'][0]







share|improve this answer


























  • Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product

    – Manuel
    Jan 3 at 14:32








  • 1





    Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.

    – Granitosaurus
    Jan 3 at 14:41
















2














Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.



In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:



enter image description here



Now you need to:





  1. Identify where it json part is:



    It usually is somewhere in <script> tags or as data-<something> attribute of any tag.




  2. Extract json part:



    If it's embedded into javascript directly you can clean extract it with regex:



    script = response.xpath('//script/text()').extract_frist()
    import re
    # capture everything between {}
    data = re.findall(script, '({.+?}_')



  3. Load the json as dict and parse the tree, e.g.:



    import json
    d = json.loads(data[0])
    d['products'][0]







share|improve this answer


























  • Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product

    – Manuel
    Jan 3 at 14:32








  • 1





    Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.

    – Granitosaurus
    Jan 3 at 14:41














2












2








2







Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.



In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:



enter image description here



Now you need to:





  1. Identify where it json part is:



    It usually is somewhere in <script> tags or as data-<something> attribute of any tag.




  2. Extract json part:



    If it's embedded into javascript directly you can clean extract it with regex:



    script = response.xpath('//script/text()').extract_frist()
    import re
    # capture everything between {}
    data = re.findall(script, '({.+?}_')



  3. Load the json as dict and parse the tree, e.g.:



    import json
    d = json.loads(data[0])
    d['products'][0]







share|improve this answer















Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.



In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:



enter image description here



Now you need to:





  1. Identify where it json part is:



    It usually is somewhere in <script> tags or as data-<something> attribute of any tag.




  2. Extract json part:



    If it's embedded into javascript directly you can clean extract it with regex:



    script = response.xpath('//script/text()').extract_frist()
    import re
    # capture everything between {}
    data = re.findall(script, '({.+?}_')



  3. Load the json as dict and parse the tree, e.g.:



    import json
    d = json.loads(data[0])
    d['products'][0]








share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 3 at 13:55

























answered Jan 3 at 13:46









GranitosaurusGranitosaurus

11.6k22445




11.6k22445













  • Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product

    – Manuel
    Jan 3 at 14:32








  • 1





    Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.

    – Granitosaurus
    Jan 3 at 14:41



















  • Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product

    – Manuel
    Jan 3 at 14:32








  • 1





    Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.

    – Granitosaurus
    Jan 3 at 14:41

















Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product

– Manuel
Jan 3 at 14:32







Thanks for the help. I found the variation values, which also contain the color variants. Below the variationValues, i found asinVariationValues. Which it seems gets you the product ID, with certain size and color from the variationValues dictionary (probably not a dictionary but something similar). Can that be scraped the same way?. So i could associate the numbers in asinVariationValues with the ones in variationValues, getting then each variant of the product

– Manuel
Jan 3 at 14:32






1




1





Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.

– Granitosaurus
Jan 3 at 14:41





Page source is your playground! If you ctrl+f with js disabled and it's there - you can scrape it. Usually all data is in single json but amazon is pretty huge so they might separate ASIN codes and the actual product data. I'd advise to copy all of that json data into some visual json reprentation tool, like: jsoneditoronline.org and dig around.

– Granitosaurus
Jan 3 at 14:41




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54022455%2fscraping-dropdown-prompts%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas