Quote:
Originally Posted by yoss15
I would really appreciate it. For starters what does the for section in soup.findAll line do?
|
The job of parse_index is to look at a page and find links on that page to articles. The for section in soup.findAll is "finding all" tags that have a link in them to an article. More specifically, it's the beginning of that process. Do you know what a <div> tag is? The way that line works is it finds all tagged parts of the page that are tagged <div class="content">
I'll be nice and look at your page - hold on ....
There aren't any div tags like that.
You should probably be doing something like this:
Code:
for section in soup.findAll('li'):
Then something like:
Code:
for post in section.findAll('a', href=True):
That will find the <li> tags that have <a> tags inside with hrefs.