Solved Using a list populated from a URLLIB query to a website. Cannot figure out how to increment list by 1...
Found the solution. please see revised code that works. I have reviewed other similar issues. I think mine is different in that I first go to the known website and then grab the links to the individual files and save to a list, then retrieve the files one after another. I am building code that will take the URL of a website then scan that webenter code here
site for needed links for work to mass download the files instead of clicking on each link and download one by one for a recurring end of the month task. Those topic links are saved into a list, univ_list. I am using a while loop.
There are 164 links that need to be downloaded for this month. I use a join with the base URL, http://www.linksneeded.com/ + the file piece (since each file has its own piece of the puzzle for example, This is the first file.xls) so it would look like this: http://www.linksneeded.com/This is the first file.xls. The beginning of each file full location remains the same while the individual file names change that I have saved in univ_list. I need the first pass to use univ_list[1], the second to use univ_list[2] ..... all the way to the 164 (and that number changes per site and each month) univ_list[164].
I have tried setting another variable and performing a += 1 with each loop but get invalid syntax for this attempt: print(page_retrieve + univ_list += 1). I have it set to print to validate the output.
import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
page_retrieve = input('Enter page URL to retrieve: ')
page = requests.get(page_retrieve)
type(page)
page.status_code == requests.codes.ok
print()
print('Successful page access:', page.status_code == requests.codes.ok)
links = 1
print()
if page.status_code != 200:
print('404 Client Error: URL Not Found for: ', (page_retrieve))
print('Data file non existent. Please review file URL for accuracy.')
if page.status_code == 200:
univ_list =
html = urlopen(page_retrieve)
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'], 'n')
univ_list.append(link.attrs['href'])
while links < len(univ_list):
#print(page_retrieve + univ_list[-1])
links = links + 1
#print(links)
#Actual download of files
while links > 0:
print('Number of Links left to process:', links)
#print('Full file URL:', page_retrieve + univ_list[-1])
#print()
fileURL = ('page_retrieve + univ_list[-1]')
#print(univ_list[-1])
print()
urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])
print('Link to download:', univ_list[-1])
print()
univ_list.pop()
print('Link Removed:', univ_list[-1])
print()
links = links - 1
#print('Number of Links left to process:', links)
#print()
print()
print('Code complete')
I get the full path to the file for 2nd file just to show my code works to thatenter code here
point, but cannot figure how to increment the univ_list list value to go through each link pulled. Once I can get the univ_list values increment I am going to add the retrieve piece using a modified open, write and close method I developed that I know works in another piece of code I have developed. All in an effort to automate downloading of files on several websites for end of the month data pulls that I then use for reporting compliance progress for my company policies. I have tried saving the links to a text file but get only one line with all 164 links, so looked at lists and now having this problem.
list python-requests urllib
add a comment |
Found the solution. please see revised code that works. I have reviewed other similar issues. I think mine is different in that I first go to the known website and then grab the links to the individual files and save to a list, then retrieve the files one after another. I am building code that will take the URL of a website then scan that webenter code here
site for needed links for work to mass download the files instead of clicking on each link and download one by one for a recurring end of the month task. Those topic links are saved into a list, univ_list. I am using a while loop.
There are 164 links that need to be downloaded for this month. I use a join with the base URL, http://www.linksneeded.com/ + the file piece (since each file has its own piece of the puzzle for example, This is the first file.xls) so it would look like this: http://www.linksneeded.com/This is the first file.xls. The beginning of each file full location remains the same while the individual file names change that I have saved in univ_list. I need the first pass to use univ_list[1], the second to use univ_list[2] ..... all the way to the 164 (and that number changes per site and each month) univ_list[164].
I have tried setting another variable and performing a += 1 with each loop but get invalid syntax for this attempt: print(page_retrieve + univ_list += 1). I have it set to print to validate the output.
import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
page_retrieve = input('Enter page URL to retrieve: ')
page = requests.get(page_retrieve)
type(page)
page.status_code == requests.codes.ok
print()
print('Successful page access:', page.status_code == requests.codes.ok)
links = 1
print()
if page.status_code != 200:
print('404 Client Error: URL Not Found for: ', (page_retrieve))
print('Data file non existent. Please review file URL for accuracy.')
if page.status_code == 200:
univ_list =
html = urlopen(page_retrieve)
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'], 'n')
univ_list.append(link.attrs['href'])
while links < len(univ_list):
#print(page_retrieve + univ_list[-1])
links = links + 1
#print(links)
#Actual download of files
while links > 0:
print('Number of Links left to process:', links)
#print('Full file URL:', page_retrieve + univ_list[-1])
#print()
fileURL = ('page_retrieve + univ_list[-1]')
#print(univ_list[-1])
print()
urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])
print('Link to download:', univ_list[-1])
print()
univ_list.pop()
print('Link Removed:', univ_list[-1])
print()
links = links - 1
#print('Number of Links left to process:', links)
#print()
print()
print('Code complete')
I get the full path to the file for 2nd file just to show my code works to thatenter code here
point, but cannot figure how to increment the univ_list list value to go through each link pulled. Once I can get the univ_list values increment I am going to add the retrieve piece using a modified open, write and close method I developed that I know works in another piece of code I have developed. All in an effort to automate downloading of files on several websites for end of the month data pulls that I then use for reporting compliance progress for my company policies. I have tried saving the links to a text file but get only one line with all 164 links, so looked at lists and now having this problem.
list python-requests urllib
actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.
– Shawn Shenton
Jan 3 at 4:31
add a comment |
Found the solution. please see revised code that works. I have reviewed other similar issues. I think mine is different in that I first go to the known website and then grab the links to the individual files and save to a list, then retrieve the files one after another. I am building code that will take the URL of a website then scan that webenter code here
site for needed links for work to mass download the files instead of clicking on each link and download one by one for a recurring end of the month task. Those topic links are saved into a list, univ_list. I am using a while loop.
There are 164 links that need to be downloaded for this month. I use a join with the base URL, http://www.linksneeded.com/ + the file piece (since each file has its own piece of the puzzle for example, This is the first file.xls) so it would look like this: http://www.linksneeded.com/This is the first file.xls. The beginning of each file full location remains the same while the individual file names change that I have saved in univ_list. I need the first pass to use univ_list[1], the second to use univ_list[2] ..... all the way to the 164 (and that number changes per site and each month) univ_list[164].
I have tried setting another variable and performing a += 1 with each loop but get invalid syntax for this attempt: print(page_retrieve + univ_list += 1). I have it set to print to validate the output.
import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
page_retrieve = input('Enter page URL to retrieve: ')
page = requests.get(page_retrieve)
type(page)
page.status_code == requests.codes.ok
print()
print('Successful page access:', page.status_code == requests.codes.ok)
links = 1
print()
if page.status_code != 200:
print('404 Client Error: URL Not Found for: ', (page_retrieve))
print('Data file non existent. Please review file URL for accuracy.')
if page.status_code == 200:
univ_list =
html = urlopen(page_retrieve)
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'], 'n')
univ_list.append(link.attrs['href'])
while links < len(univ_list):
#print(page_retrieve + univ_list[-1])
links = links + 1
#print(links)
#Actual download of files
while links > 0:
print('Number of Links left to process:', links)
#print('Full file URL:', page_retrieve + univ_list[-1])
#print()
fileURL = ('page_retrieve + univ_list[-1]')
#print(univ_list[-1])
print()
urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])
print('Link to download:', univ_list[-1])
print()
univ_list.pop()
print('Link Removed:', univ_list[-1])
print()
links = links - 1
#print('Number of Links left to process:', links)
#print()
print()
print('Code complete')
I get the full path to the file for 2nd file just to show my code works to thatenter code here
point, but cannot figure how to increment the univ_list list value to go through each link pulled. Once I can get the univ_list values increment I am going to add the retrieve piece using a modified open, write and close method I developed that I know works in another piece of code I have developed. All in an effort to automate downloading of files on several websites for end of the month data pulls that I then use for reporting compliance progress for my company policies. I have tried saving the links to a text file but get only one line with all 164 links, so looked at lists and now having this problem.
list python-requests urllib
Found the solution. please see revised code that works. I have reviewed other similar issues. I think mine is different in that I first go to the known website and then grab the links to the individual files and save to a list, then retrieve the files one after another. I am building code that will take the URL of a website then scan that webenter code here
site for needed links for work to mass download the files instead of clicking on each link and download one by one for a recurring end of the month task. Those topic links are saved into a list, univ_list. I am using a while loop.
There are 164 links that need to be downloaded for this month. I use a join with the base URL, http://www.linksneeded.com/ + the file piece (since each file has its own piece of the puzzle for example, This is the first file.xls) so it would look like this: http://www.linksneeded.com/This is the first file.xls. The beginning of each file full location remains the same while the individual file names change that I have saved in univ_list. I need the first pass to use univ_list[1], the second to use univ_list[2] ..... all the way to the 164 (and that number changes per site and each month) univ_list[164].
I have tried setting another variable and performing a += 1 with each loop but get invalid syntax for this attempt: print(page_retrieve + univ_list += 1). I have it set to print to validate the output.
import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
page_retrieve = input('Enter page URL to retrieve: ')
page = requests.get(page_retrieve)
type(page)
page.status_code == requests.codes.ok
print()
print('Successful page access:', page.status_code == requests.codes.ok)
links = 1
print()
if page.status_code != 200:
print('404 Client Error: URL Not Found for: ', (page_retrieve))
print('Data file non existent. Please review file URL for accuracy.')
if page.status_code == 200:
univ_list =
html = urlopen(page_retrieve)
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'], 'n')
univ_list.append(link.attrs['href'])
while links < len(univ_list):
#print(page_retrieve + univ_list[-1])
links = links + 1
#print(links)
#Actual download of files
while links > 0:
print('Number of Links left to process:', links)
#print('Full file URL:', page_retrieve + univ_list[-1])
#print()
fileURL = ('page_retrieve + univ_list[-1]')
#print(univ_list[-1])
print()
urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])
print('Link to download:', univ_list[-1])
print()
univ_list.pop()
print('Link Removed:', univ_list[-1])
print()
links = links - 1
#print('Number of Links left to process:', links)
#print()
print()
print('Code complete')
I get the full path to the file for 2nd file just to show my code works to thatenter code here
point, but cannot figure how to increment the univ_list list value to go through each link pulled. Once I can get the univ_list values increment I am going to add the retrieve piece using a modified open, write and close method I developed that I know works in another piece of code I have developed. All in an effort to automate downloading of files on several websites for end of the month data pulls that I then use for reporting compliance progress for my company policies. I have tried saving the links to a text file but get only one line with all 164 links, so looked at lists and now having this problem.
list python-requests urllib
list python-requests urllib
edited Jan 3 at 4:38
Shawn Shenton
asked Jan 1 at 8:51
Shawn ShentonShawn Shenton
11
11
actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.
– Shawn Shenton
Jan 3 at 4:31
add a comment |
actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.
– Shawn Shenton
Jan 3 at 4:31
actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.
– Shawn Shenton
Jan 3 at 4:31
actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.
– Shawn Shenton
Jan 3 at 4:31
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994157%2fsolved-using-a-list-populated-from-a-urllib-query-to-a-website-cannot-figure-ou%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994157%2fsolved-using-a-list-populated-from-a-urllib-query-to-a-website-cannot-figure-ou%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.
– Shawn Shenton
Jan 3 at 4:31