Thread: web2lrf
View Single Post
Old 02-22-2008, 12:48 PM   #182
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Hi Kovid and all.

I looked at Atlantic and other profiles, seemed straightforward to parse the WSJ page. But knowing nothing about pyton doesn't help.

Now I get to the point where it finds the links and downloads (I think it downloads), then I get this error:

Traceback (most recent call last):
File "convert_from.py", line 192, in <module>
File "convert_from.py", line 186, in main
File "convert_from.py", line 125, in process_profile
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 100, in __init__
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 136, in build_inde
x
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 115, in build_sub_
index
KeyError: u'date'


Here is the part that I changed:
def parse_feeds(self):
src = self.browser.open('http://online.wsj.com/page/2_0133.html').read()
soup = BeautifulSoup(src)

articles = []
for item in soup.findAll('a', attrs={'class':'bold80'}):
url = item['href']
url = 'http://online.wsj.com'+url.replace('/article', '/article_print')
title = self.tag_to_string(item)
articles.append({
'title':title, 'url':url, 'description':''
})

return {'Todays Paper' : articles }


I didn't change the get_browser and preprocess_regexps, working fine in existing profile.

Do you see anything obvious in my lines? I know not much info here to troubleshoot.

I usually get one shot to run it in 2-3 hours. Because web2lrf doesn't log off from their site, next run cannot login for some time. How do you guys develop your profiles? Not much fun :-(

Kovid, if you have nothing better to do and have time/desire to help me here, you have my login/password in your pm box, 2-3 weeks old. Just add "5" at the end of password, had to change at some point.

Thanks in advance,
David
ddavtian is offline   Reply With Quote