Hi Kovid and all.
I looked at Atlantic and other profiles, seemed straightforward to parse the WSJ page. But knowing nothing about pyton doesn't help.
Now I get to the point where it finds the links and downloads (I think it downloads), then I get this error:
Traceback (most recent call last):
File "convert_from.py", line 192, in <module>
File "convert_from.py", line 186, in main
File "convert_from.py", line 125, in process_profile
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 100, in __init__
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 136, in build_inde
x
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 115, in build_sub_
index
KeyError: u'date'
Here is the part that I changed:
def parse_feeds(self):
src = self.browser.open('http://online.wsj.com/page/2_0133.html').read()
soup = BeautifulSoup(src)
articles = []
for item in soup.findAll('a', attrs={'class':'bold80'}):
url = item['href']
url = 'http://online.wsj.com'+url.replace('/article', '/article_print')
title = self.tag_to_string(item)
articles.append({
'title':title, 'url':url, 'description':''
})
return {'Todays Paper' : articles }
I didn't change the get_browser and preprocess_regexps, working fine in existing profile.
Do you see anything obvious in my lines? I know not much info here to troubleshoot.
I usually get one shot to run it in 2-3 hours. Because web2lrf doesn't log off from their site, next run cannot login for some time. How do you guys develop your profiles? Not much fun :-(
Kovid, if you have nothing better to do and have time/desire to help me here, you have my login/password in your pm box, 2-3 weeks old. Just add "5" at the end of password, had to change at some point.
Thanks in advance,
David
|