MobileRead Forums - View Single Post

Starson17 · 08-10-2011, 05:22 PM

Quote:

Originally Posted by yoss15

I would really appreciate it. For starters what does the for section in soup.findAll line do?

The job of parse_index is to look at a page and find links on that page to articles. The for section in soup.findAll is "finding all" tags that have a link in them to an article. More specifically, it's the beginning of that process. Do you know what a <div> tag is? The way that line works is it finds all tagged parts of the page that are tagged <div class="content">

I'll be nice and look at your page - hold on ....

There aren't any div tags like that.

You should probably be doing something like this:

Code:

for section in soup.findAll('li'):

Then something like:

Code:

for post in section.findAll('a', href=True):

That will find the <li> tags that have <a> tags inside with hrefs.