Solved Using a list populated from a URLLIB query to a website. Cannot figure out how to increment list by 1...












0















Found the solution. please see revised code that works. I have reviewed other similar issues. I think mine is different in that I first go to the known website and then grab the links to the individual files and save to a list, then retrieve the files one after another. I am building code that will take the URL of a website then scan that webenter code heresite for needed links for work to mass download the files instead of clicking on each link and download one by one for a recurring end of the month task. Those topic links are saved into a list, univ_list. I am using a while loop.



There are 164 links that need to be downloaded for this month. I use a join with the base URL, http://www.linksneeded.com/ + the file piece (since each file has its own piece of the puzzle for example, This is the first file.xls) so it would look like this: http://www.linksneeded.com/This is the first file.xls. The beginning of each file full location remains the same while the individual file names change that I have saved in univ_list. I need the first pass to use univ_list[1], the second to use univ_list[2] ..... all the way to the 164 (and that number changes per site and each month) univ_list[164].



I have tried setting another variable and performing a += 1 with each loop but get invalid syntax for this attempt: print(page_retrieve + univ_list += 1). I have it set to print to validate the output.



import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

page_retrieve = input('Enter page URL to retrieve: ')

page = requests.get(page_retrieve)
type(page)

page.status_code == requests.codes.ok
print()

print('Successful page access:', page.status_code == requests.codes.ok)
links = 1
print()

if page.status_code != 200:
print('404 Client Error: URL Not Found for: ', (page_retrieve))
print('Data file non existent. Please review file URL for accuracy.')

if page.status_code == 200:
univ_list =
html = urlopen(page_retrieve)
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'], 'n')
univ_list.append(link.attrs['href'])

while links < len(univ_list):
#print(page_retrieve + univ_list[-1])
links = links + 1
#print(links)

#Actual download of files
while links > 0:
print('Number of Links left to process:', links)
#print('Full file URL:', page_retrieve + univ_list[-1])
#print()
fileURL = ('page_retrieve + univ_list[-1]')
#print(univ_list[-1])
print()
urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])
print('Link to download:', univ_list[-1])
print()
univ_list.pop()
print('Link Removed:', univ_list[-1])
print()
links = links - 1
#print('Number of Links left to process:', links)
#print()

print()
print('Code complete')


I get the full path to the file for 2nd file just to show my code works to thatenter code here point, but cannot figure how to increment the univ_list list value to go through each link pulled. Once I can get the univ_list values increment I am going to add the retrieve piece using a modified open, write and close method I developed that I know works in another piece of code I have developed. All in an effort to automate downloading of files on several websites for end of the month data pulls that I then use for reporting compliance progress for my company policies. I have tried saving the links to a text file but get only one line with all 164 links, so looked at lists and now having this problem.










share|improve this question

























  • actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

    – Shawn Shenton
    Jan 3 at 4:31


















0















Found the solution. please see revised code that works. I have reviewed other similar issues. I think mine is different in that I first go to the known website and then grab the links to the individual files and save to a list, then retrieve the files one after another. I am building code that will take the URL of a website then scan that webenter code heresite for needed links for work to mass download the files instead of clicking on each link and download one by one for a recurring end of the month task. Those topic links are saved into a list, univ_list. I am using a while loop.



There are 164 links that need to be downloaded for this month. I use a join with the base URL, http://www.linksneeded.com/ + the file piece (since each file has its own piece of the puzzle for example, This is the first file.xls) so it would look like this: http://www.linksneeded.com/This is the first file.xls. The beginning of each file full location remains the same while the individual file names change that I have saved in univ_list. I need the first pass to use univ_list[1], the second to use univ_list[2] ..... all the way to the 164 (and that number changes per site and each month) univ_list[164].



I have tried setting another variable and performing a += 1 with each loop but get invalid syntax for this attempt: print(page_retrieve + univ_list += 1). I have it set to print to validate the output.



import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

page_retrieve = input('Enter page URL to retrieve: ')

page = requests.get(page_retrieve)
type(page)

page.status_code == requests.codes.ok
print()

print('Successful page access:', page.status_code == requests.codes.ok)
links = 1
print()

if page.status_code != 200:
print('404 Client Error: URL Not Found for: ', (page_retrieve))
print('Data file non existent. Please review file URL for accuracy.')

if page.status_code == 200:
univ_list =
html = urlopen(page_retrieve)
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'], 'n')
univ_list.append(link.attrs['href'])

while links < len(univ_list):
#print(page_retrieve + univ_list[-1])
links = links + 1
#print(links)

#Actual download of files
while links > 0:
print('Number of Links left to process:', links)
#print('Full file URL:', page_retrieve + univ_list[-1])
#print()
fileURL = ('page_retrieve + univ_list[-1]')
#print(univ_list[-1])
print()
urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])
print('Link to download:', univ_list[-1])
print()
univ_list.pop()
print('Link Removed:', univ_list[-1])
print()
links = links - 1
#print('Number of Links left to process:', links)
#print()

print()
print('Code complete')


I get the full path to the file for 2nd file just to show my code works to thatenter code here point, but cannot figure how to increment the univ_list list value to go through each link pulled. Once I can get the univ_list values increment I am going to add the retrieve piece using a modified open, write and close method I developed that I know works in another piece of code I have developed. All in an effort to automate downloading of files on several websites for end of the month data pulls that I then use for reporting compliance progress for my company policies. I have tried saving the links to a text file but get only one line with all 164 links, so looked at lists and now having this problem.










share|improve this question

























  • actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

    – Shawn Shenton
    Jan 3 at 4:31
















0












0








0








Found the solution. please see revised code that works. I have reviewed other similar issues. I think mine is different in that I first go to the known website and then grab the links to the individual files and save to a list, then retrieve the files one after another. I am building code that will take the URL of a website then scan that webenter code heresite for needed links for work to mass download the files instead of clicking on each link and download one by one for a recurring end of the month task. Those topic links are saved into a list, univ_list. I am using a while loop.



There are 164 links that need to be downloaded for this month. I use a join with the base URL, http://www.linksneeded.com/ + the file piece (since each file has its own piece of the puzzle for example, This is the first file.xls) so it would look like this: http://www.linksneeded.com/This is the first file.xls. The beginning of each file full location remains the same while the individual file names change that I have saved in univ_list. I need the first pass to use univ_list[1], the second to use univ_list[2] ..... all the way to the 164 (and that number changes per site and each month) univ_list[164].



I have tried setting another variable and performing a += 1 with each loop but get invalid syntax for this attempt: print(page_retrieve + univ_list += 1). I have it set to print to validate the output.



import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

page_retrieve = input('Enter page URL to retrieve: ')

page = requests.get(page_retrieve)
type(page)

page.status_code == requests.codes.ok
print()

print('Successful page access:', page.status_code == requests.codes.ok)
links = 1
print()

if page.status_code != 200:
print('404 Client Error: URL Not Found for: ', (page_retrieve))
print('Data file non existent. Please review file URL for accuracy.')

if page.status_code == 200:
univ_list =
html = urlopen(page_retrieve)
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'], 'n')
univ_list.append(link.attrs['href'])

while links < len(univ_list):
#print(page_retrieve + univ_list[-1])
links = links + 1
#print(links)

#Actual download of files
while links > 0:
print('Number of Links left to process:', links)
#print('Full file URL:', page_retrieve + univ_list[-1])
#print()
fileURL = ('page_retrieve + univ_list[-1]')
#print(univ_list[-1])
print()
urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])
print('Link to download:', univ_list[-1])
print()
univ_list.pop()
print('Link Removed:', univ_list[-1])
print()
links = links - 1
#print('Number of Links left to process:', links)
#print()

print()
print('Code complete')


I get the full path to the file for 2nd file just to show my code works to thatenter code here point, but cannot figure how to increment the univ_list list value to go through each link pulled. Once I can get the univ_list values increment I am going to add the retrieve piece using a modified open, write and close method I developed that I know works in another piece of code I have developed. All in an effort to automate downloading of files on several websites for end of the month data pulls that I then use for reporting compliance progress for my company policies. I have tried saving the links to a text file but get only one line with all 164 links, so looked at lists and now having this problem.










share|improve this question
















Found the solution. please see revised code that works. I have reviewed other similar issues. I think mine is different in that I first go to the known website and then grab the links to the individual files and save to a list, then retrieve the files one after another. I am building code that will take the URL of a website then scan that webenter code heresite for needed links for work to mass download the files instead of clicking on each link and download one by one for a recurring end of the month task. Those topic links are saved into a list, univ_list. I am using a while loop.



There are 164 links that need to be downloaded for this month. I use a join with the base URL, http://www.linksneeded.com/ + the file piece (since each file has its own piece of the puzzle for example, This is the first file.xls) so it would look like this: http://www.linksneeded.com/This is the first file.xls. The beginning of each file full location remains the same while the individual file names change that I have saved in univ_list. I need the first pass to use univ_list[1], the second to use univ_list[2] ..... all the way to the 164 (and that number changes per site and each month) univ_list[164].



I have tried setting another variable and performing a += 1 with each loop but get invalid syntax for this attempt: print(page_retrieve + univ_list += 1). I have it set to print to validate the output.



import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

page_retrieve = input('Enter page URL to retrieve: ')

page = requests.get(page_retrieve)
type(page)

page.status_code == requests.codes.ok
print()

print('Successful page access:', page.status_code == requests.codes.ok)
links = 1
print()

if page.status_code != 200:
print('404 Client Error: URL Not Found for: ', (page_retrieve))
print('Data file non existent. Please review file URL for accuracy.')

if page.status_code == 200:
univ_list =
html = urlopen(page_retrieve)
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'], 'n')
univ_list.append(link.attrs['href'])

while links < len(univ_list):
#print(page_retrieve + univ_list[-1])
links = links + 1
#print(links)

#Actual download of files
while links > 0:
print('Number of Links left to process:', links)
#print('Full file URL:', page_retrieve + univ_list[-1])
#print()
fileURL = ('page_retrieve + univ_list[-1]')
#print(univ_list[-1])
print()
urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])
print('Link to download:', univ_list[-1])
print()
univ_list.pop()
print('Link Removed:', univ_list[-1])
print()
links = links - 1
#print('Number of Links left to process:', links)
#print()

print()
print('Code complete')


I get the full path to the file for 2nd file just to show my code works to thatenter code here point, but cannot figure how to increment the univ_list list value to go through each link pulled. Once I can get the univ_list values increment I am going to add the retrieve piece using a modified open, write and close method I developed that I know works in another piece of code I have developed. All in an effort to automate downloading of files on several websites for end of the month data pulls that I then use for reporting compliance progress for my company policies. I have tried saving the links to a text file but get only one line with all 164 links, so looked at lists and now having this problem.







list python-requests urllib






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 3 at 4:38







Shawn Shenton

















asked Jan 1 at 8:51









Shawn ShentonShawn Shenton

11




11













  • actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

    – Shawn Shenton
    Jan 3 at 4:31





















  • actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

    – Shawn Shenton
    Jan 3 at 4:31



















actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

– Shawn Shenton
Jan 3 at 4:31







actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

– Shawn Shenton
Jan 3 at 4:31














0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994157%2fsolved-using-a-list-populated-from-a-urllib-query-to-a-website-cannot-figure-ou%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994157%2fsolved-using-a-list-populated-from-a-urllib-query-to-a-website-cannot-figure-ou%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas