Thread: web2lrf
View Single Post
Old 02-23-2008, 01:47 AM   #189
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
I've used your profile, only changed the parsing part to this new parse_feeds.
I'm using the run time, didn't get the correct date from the page. Size is half of main WSJ profile (around 2Mb). Feel free to improve and post to libprc.

Here is the method (I don't know how to post correctly, all indentation is gone):

Code:
	def parse_feeds(self):
		src = self.browser.open('http://online.wsj.com/page/2_0133.html').read()
		soup = BeautifulSoup(src)
		issue_date = time.ctime()
		
		articles = []
		for item in soup.findAll('a', attrs={'class':'bold80'}):
			url = item['href']
			url = 'http://online.wsj.com'+url.replace('/article', '/article_print')
			title = self.tag_to_string(item)
			articles.append({
				'title':title, 'url':url, 'description':'', 'date':issue_date
				})
               
    
		return {'Todays Paper' : articles }
ddavtian is offline   Reply With Quote