MobileRead Forums - View Single Post

ddavtian · 02-23-2008, 01:47 AM

I've used your profile, only changed the parsing part to this new parse_feeds.
I'm using the run time, didn't get the correct date from the page. Size is half of main WSJ profile (around 2Mb). Feel free to improve and post to libprc.

Here is the method (I don't know how to post correctly, all indentation is gone):

Code:

	def parse_feeds(self):
		src = self.browser.open('http://online.wsj.com/page/2_0133.html').read()
		soup = BeautifulSoup(src)
		issue_date = time.ctime()
		
		articles = []
		for item in soup.findAll('a', attrs={'class':'bold80'}):
			url = item['href']
			url = 'http://online.wsj.com'+url.replace('/article', '/article_print')
			title = self.tag_to_string(item)
			articles.append({
				'title':title, 'url':url, 'description':'', 'date':issue_date
				})
               
    
		return {'Todays Paper' : articles }

02-23-2008, 01:47 AM	#189
ddavtian Addict Posts: 271 Karma: 332 Join Date: Nov 2003 Location: San Francisco, USA Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U	I've used your profile, only changed the parsing part to this new parse_feeds. I'm using the run time, didn't get the correct date from the page. Size is half of main WSJ profile (around 2Mb). Feel free to improve and post to libprc. Here is the method (I don't know how to post correctly, all indentation is gone): Code: def parse_feeds(self): src = self.browser.open('http://online.wsj.com/page/2_0133.html').read() soup = BeautifulSoup(src) issue_date = time.ctime() articles = [] for item in soup.findAll('a', attrs={'class':'bold80'}): url = item['href'] url = 'http://online.wsj.com'+url.replace('/article', '/article_print') title = self.tag_to_string(item) articles.append({ 'title':title, 'url':url, 'description':'', 'date':issue_date }) return {'Todays Paper' : articles }