Solved Using a list populated from a URLLIB query to a website. Cannot figure out how to increment list by 1...

Found the solution. please see revised code that works. I have reviewed other similar issues. I think mine is different in that I first go to the known website and then grab the links to the individual files and save to a list, then retrieve the files one after another. I am building code that will take the URL of a website then scan that webenter code heresite for needed links for work to mass download the files instead of clicking on each link and download one by one for a recurring end of the month task. Those topic links are saved into a list, univ_list. I am using a while loop.

There are 164 links that need to be downloaded for this month. I use a join with the base URL, http://www.linksneeded.com/ + the file piece (since each file has its own piece of the puzzle for example, This is the first file.xls) so it would look like this: http://www.linksneeded.com/This is the first file.xls. The beginning of each file full location remains the same while the individual file names change that I have saved in univ_list. I need the first pass to use univ_list[1], the second to use univ_list[2] ..... all the way to the 164 (and that number changes per site and each month) univ_list[164].

I have tried setting another variable and performing a += 1 with each loop but get invalid syntax for this attempt: print(page_retrieve + univ_list += 1). I have it set to print to validate the output.

import urllib

from urllib.request import urlopen

from bs4 import BeautifulSoup

import requests



page_retrieve = input('Enter page URL to retrieve: ')



page = requests.get(page_retrieve)

type(page)



page.status_code == requests.codes.ok

print()



print('Successful page access:', page.status_code == requests.codes.ok)

links = 1

print()



if page.status_code != 200:

    print('404 Client Error: URL Not Found for: ', (page_retrieve))

    print('Data file non existent. Please review file URL for accuracy.')



if page.status_code == 200:

    univ_list = 

    html = urlopen(page_retrieve)

    bsObj = BeautifulSoup(html, "html.parser")

    for link in bsObj.findAll("a"):

        if 'href' in link.attrs:

            print(link.attrs['href'], 'n')

            univ_list.append(link.attrs['href'])



while links < len(univ_list):

    #print(page_retrieve + univ_list[-1])

    links = links + 1

    #print(links)



#Actual download of files

while links > 0:

    print('Number of Links left to process:', links)

    #print('Full file URL:', page_retrieve + univ_list[-1])

    #print()

    fileURL = ('page_retrieve + univ_list[-1]')

    #print(univ_list[-1])

    print()

    urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])

    print('Link to download:', univ_list[-1])

    print()

    univ_list.pop()

    print('Link Removed:', univ_list[-1])

    print()

    links = links - 1

    #print('Number of Links left to process:', links)

    #print()



print()

print('Code complete')

I get the full path to the file for 2nd file just to show my code works to thatenter code here point, but cannot figure how to increment the univ_list list value to go through each link pulled. Once I can get the univ_list values increment I am going to add the retrieve piece using a modified open, write and close method I developed that I know works in another piece of code I have developed. All in an effort to automate downloading of files on several websites for end of the month data pulls that I then use for reporting compliance progress for my company policies. I have tried saving the links to a text file but get only one line with all 164 links, so looked at lists and now having this problem.

edited Jan 3 at 4:38

asked Jan 1 at 8:51

Shawn Shenton

actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

– Shawn Shenton
Jan 3 at 4:31

add a comment |

import urllib

from urllib.request import urlopen

from bs4 import BeautifulSoup

import requests



page_retrieve = input('Enter page URL to retrieve: ')



page = requests.get(page_retrieve)

type(page)



page.status_code == requests.codes.ok

print()



print('Successful page access:', page.status_code == requests.codes.ok)

links = 1

print()



if page.status_code != 200:

    print('404 Client Error: URL Not Found for: ', (page_retrieve))

    print('Data file non existent. Please review file URL for accuracy.')



if page.status_code == 200:

    univ_list = 

    html = urlopen(page_retrieve)

    bsObj = BeautifulSoup(html, "html.parser")

    for link in bsObj.findAll("a"):

        if 'href' in link.attrs:

            print(link.attrs['href'], 'n')

            univ_list.append(link.attrs['href'])



while links < len(univ_list):

    #print(page_retrieve + univ_list[-1])

    links = links + 1

    #print(links)



#Actual download of files

while links > 0:

    print('Number of Links left to process:', links)

    #print('Full file URL:', page_retrieve + univ_list[-1])

    #print()

    fileURL = ('page_retrieve + univ_list[-1]')

    #print(univ_list[-1])

    print()

    urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])

    print('Link to download:', univ_list[-1])

    print()

    univ_list.pop()

    print('Link Removed:', univ_list[-1])

    print()

    links = links - 1

    #print('Number of Links left to process:', links)

    #print()



print()

print('Code complete')

edited Jan 3 at 4:38

asked Jan 1 at 8:51

Shawn Shenton

actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

– Shawn Shenton
Jan 3 at 4:31

add a comment |

import urllib

from urllib.request import urlopen

from bs4 import BeautifulSoup

import requests



page_retrieve = input('Enter page URL to retrieve: ')



page = requests.get(page_retrieve)

type(page)



page.status_code == requests.codes.ok

print()



print('Successful page access:', page.status_code == requests.codes.ok)

links = 1

print()



if page.status_code != 200:

    print('404 Client Error: URL Not Found for: ', (page_retrieve))

    print('Data file non existent. Please review file URL for accuracy.')



if page.status_code == 200:

    univ_list = 

    html = urlopen(page_retrieve)

    bsObj = BeautifulSoup(html, "html.parser")

    for link in bsObj.findAll("a"):

        if 'href' in link.attrs:

            print(link.attrs['href'], 'n')

            univ_list.append(link.attrs['href'])



while links < len(univ_list):

    #print(page_retrieve + univ_list[-1])

    links = links + 1

    #print(links)



#Actual download of files

while links > 0:

    print('Number of Links left to process:', links)

    #print('Full file URL:', page_retrieve + univ_list[-1])

    #print()

    fileURL = ('page_retrieve + univ_list[-1]')

    #print(univ_list[-1])

    print()

    urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])

    print('Link to download:', univ_list[-1])

    print()

    univ_list.pop()

    print('Link Removed:', univ_list[-1])

    print()

    links = links - 1

    #print('Number of Links left to process:', links)

    #print()



print()

print('Code complete')

edited Jan 3 at 4:38

asked Jan 1 at 8:51

Shawn Shenton

import urllib

from urllib.request import urlopen

from bs4 import BeautifulSoup

import requests



page_retrieve = input('Enter page URL to retrieve: ')



page = requests.get(page_retrieve)

type(page)



page.status_code == requests.codes.ok

print()



print('Successful page access:', page.status_code == requests.codes.ok)

links = 1

print()



if page.status_code != 200:

    print('404 Client Error: URL Not Found for: ', (page_retrieve))

    print('Data file non existent. Please review file URL for accuracy.')



if page.status_code == 200:

    univ_list = 

    html = urlopen(page_retrieve)

    bsObj = BeautifulSoup(html, "html.parser")

    for link in bsObj.findAll("a"):

        if 'href' in link.attrs:

            print(link.attrs['href'], 'n')

            univ_list.append(link.attrs['href'])



while links < len(univ_list):

    #print(page_retrieve + univ_list[-1])

    links = links + 1

    #print(links)



#Actual download of files

while links > 0:

    print('Number of Links left to process:', links)

    #print('Full file URL:', page_retrieve + univ_list[-1])

    #print()

    fileURL = ('page_retrieve + univ_list[-1]')

    #print(univ_list[-1])

    print()

    urllib.request.urlretrieve(page_retrieve + univ_list[-1], univ_list[-1])

    print('Link to download:', univ_list[-1])

    print()

    univ_list.pop()

    print('Link Removed:', univ_list[-1])

    print()

    links = links - 1

    #print('Number of Links left to process:', links)

    #print()



print()

print('Code complete')

list python-requests urllib

edited Jan 3 at 4:38

asked Jan 1 at 8:51

Shawn Shenton

edited Jan 3 at 4:38

asked Jan 1 at 8:51

Shawn Shenton

edited Jan 3 at 4:38

asked Jan 1 at 8:51

Shawn Shenton

asked Jan 1 at 8:51

Shawn Shenton

asked Jan 1 at 8:51

Shawn Shenton

actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

– Shawn Shenton
Jan 3 at 4:31

add a comment |

actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

– Shawn Shenton
Jan 3 at 4:31

actually found the solution in reading a book. Not saying it is pretty, but does work. Will load finished code into above. Hopefully, someone finds it useful.

– Shawn Shenton
Jan 3 at 4:31

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994157%2fsolved-using-a-list-populated-from-a-urllib-query-to-a-website-cannot-figure-ou%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk