MobileRead Forums - View Single Post - Request: Folha de Sao Paulo (Brazil) from UOL portal

Starson17 · 08-22-2011, 11:11 AM

Quote:

Originally Posted by luis.nando

I guess I have to change something on the soup.findAll() function.

Yes, you have to make soup.findAll() find the links to your articles. Teh code you posted is looking for tags that have class attributes of 'section-headline', 'story' or 'story headline'.

1) find the links on your pages.
2) Figure out how to identify them with BeautifulSoup
3) Use Python string handling to build the article links

If you have problems ask questions here, but start by finding each of the links to articles on your page that you want parse_index to identify, and figure out how to locate them all by tag name, class attribute, etc. If you explain in words, we can help you write the code.