MobileRead Forums - View Single Post

ddavtian · 02-22-2008, 12:48 PM

Hi Kovid and all.

I looked at Atlantic and other profiles, seemed straightforward to parse the WSJ page. But knowing nothing about pyton doesn't help.

Now I get to the point where it finds the links and downloads (I think it downloads), then I get this error:

Traceback (most recent call last):
File "convert_from.py", line 192, in <module>
File "convert_from.py", line 186, in main
File "convert_from.py", line 125, in process_profile
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 100, in __init__
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 136, in build_inde
x
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 115, in build_sub_
index
KeyError: u'date'

Here is the part that I changed:
def parse_feeds(self):
src = self.browser.open('http://online.wsj.com/page/2_0133.html').read()
soup = BeautifulSoup(src)

articles = []
for item in soup.findAll('a', attrs={'class':'bold80'}):
url = item['href']
url = 'http://online.wsj.com'+url.replace('/article', '/article_print')
title = self.tag_to_string(item)
articles.append({
'title':title, 'url':url, 'description':''
})

return {'Todays Paper' : articles }

I didn't change the get_browser and preprocess_regexps, working fine in existing profile.

Do you see anything obvious in my lines? I know not much info here to troubleshoot.

I usually get one shot to run it in 2-3 hours. Because web2lrf doesn't log off from their site, next run cannot login for some time. How do you guys develop your profiles? Not much fun :-(

Kovid, if you have nothing better to do and have time/desire to help me here, you have my login/password in your pm box, 2-3 weeks old. Just add "5" at the end of password, had to change at some point.

Thanks in advance,
David

02-22-2008, 12:48 PM	#182
ddavtian Addict Posts: 271 Karma: 332 Join Date: Nov 2003 Location: San Francisco, USA Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U	Hi Kovid and all. I looked at Atlantic and other profiles, seemed straightforward to parse the WSJ page. But knowing nothing about pyton doesn't help. Now I get to the point where it finds the links and downloads (I think it downloads), then I get this error: Traceback (most recent call last): File "convert_from.py", line 192, in <module> File "convert_from.py", line 186, in main File "convert_from.py", line 125, in process_profile File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 100, in __init__ File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 136, in build_inde x File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 115, in build_sub_ index KeyError: u'date' Here is the part that I changed: def parse_feeds(self): src = self.browser.open('http://online.wsj.com/page/2_0133.html').read() soup = BeautifulSoup(src) articles = [] for item in soup.findAll('a', attrs={'class':'bold80'}): url = item['href'] url = 'http://online.wsj.com'+url.replace('/article', '/article_print') title = self.tag_to_string(item) articles.append({ 'title':title, 'url':url, 'description':'' }) return {'Todays Paper' : articles } I didn't change the get_browser and preprocess_regexps, working fine in existing profile. Do you see anything obvious in my lines? I know not much info here to troubleshoot. I usually get one shot to run it in 2-3 hours. Because web2lrf doesn't log off from their site, next run cannot login for some time. How do you guys develop your profiles? Not much fun :-( Kovid, if you have nothing better to do and have time/desire to help me here, you have my login/password in your pm box, 2-3 weeks old. Just add "5" at the end of password, had to change at some point. Thanks in advance, David