Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Other formats > LRF

Notices

Reply
 
Thread Tools Search this Thread
Old 02-21-2008, 06:35 PM   #181
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Email notifications
https://libprs500.kovidgoyal.net/bro...es/atlantic.py
kovidgoyal is online now   Reply With Quote
Old 02-22-2008, 12:48 PM   #182
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Hi Kovid and all.

I looked at Atlantic and other profiles, seemed straightforward to parse the WSJ page. But knowing nothing about pyton doesn't help.

Now I get to the point where it finds the links and downloads (I think it downloads), then I get this error:

Traceback (most recent call last):
File "convert_from.py", line 192, in <module>
File "convert_from.py", line 186, in main
File "convert_from.py", line 125, in process_profile
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 100, in __init__
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 136, in build_inde
x
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 115, in build_sub_
index
KeyError: u'date'


Here is the part that I changed:
def parse_feeds(self):
src = self.browser.open('http://online.wsj.com/page/2_0133.html').read()
soup = BeautifulSoup(src)

articles = []
for item in soup.findAll('a', attrs={'class':'bold80'}):
url = item['href']
url = 'http://online.wsj.com'+url.replace('/article', '/article_print')
title = self.tag_to_string(item)
articles.append({
'title':title, 'url':url, 'description':''
})

return {'Todays Paper' : articles }


I didn't change the get_browser and preprocess_regexps, working fine in existing profile.

Do you see anything obvious in my lines? I know not much info here to troubleshoot.

I usually get one shot to run it in 2-3 hours. Because web2lrf doesn't log off from their site, next run cannot login for some time. How do you guys develop your profiles? Not much fun :-(

Kovid, if you have nothing better to do and have time/desire to help me here, you have my login/password in your pm box, 2-3 weeks old. Just add "5" at the end of password, had to change at some point.

Thanks in advance,
David
ddavtian is offline   Reply With Quote
Old 02-22-2008, 01:54 PM   #183
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
In the line articles.append You should have a 'date':time.time()

This will give all articles the default date. If you want the correct publication date you should parse the HTML for it.

Note that you can define a cleanup function to logout. Something like

Code:
def cleanup(self):
    self.browser.open('http://wsj.com/logout')
EDIT: Oops should be time.ctime() not time.time()

Last edited by kovidgoyal; 02-22-2008 at 02:06 PM.
kovidgoyal is online now   Reply With Quote
Old 02-22-2008, 02:10 PM   #184
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Thank you for quick reply!

I added the date, but cannot test it now because of their "security" policy :-( Not many logins allowed.

I had added the same lines (from existing profile) for logout, but I cannot get it working. After creating the output, web2lrf does not exist (doesn't return to command prompt), just sits there:


[INFO] convert_from.pyo:360: Converting to BBeB...
[INFO] convert_from.pyo:283: Rationalizing font sizes...
[INFO] convert_from.pyo:1754: Output written to C:\Misc\News\Wall Street Print Edition [Fri, Feb 22, 2008].lrf

At this point I have to kill it. And WSJ doesn't like the next run. Without logging in, it simply creates an empty file because no articles are found.

Thanks again for your help. I'll try again later.

David
ddavtian is offline   Reply With Quote
Old 02-22-2008, 02:17 PM   #185
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I remember the WSJ website had some redirect nastiness that prevent web2lrf from logging out, there are some post about it earlier in this thread.
kovidgoyal is online now   Reply With Quote
Old 02-22-2008, 05:47 PM   #186
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
I remember the WSJ website had some redirect nastiness that prevent web2lrf from logging out, there are some post about it earlier in this thread.
Yeah, I pretty much just gave up trying to get the logout function to work. It'd be great if David was able to stumble upon a way to make it work. I wish you luck.

With the current profile, I'm very careful about logging out of the site in my web browser before running it. And then I only run it once. Kind of a paid when you're testing changes in the profile, but I usually do most testing first without my login info.
JTravers is offline   Reply With Quote
Old 02-22-2008, 05:57 PM   #187
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
You smarter guys couldn't fix it, and I didn't get any luck, so the same logout problem.

But with Kovid's help (and using profile from JTravers) I got my paper working. Now I'm getting all the articles from the print edition. It's not as nice as other profiles, simply lists all the articles by the page order (A1, A2..., B1, ..., etc.). Their feeds do not cover all articles from paper. Sometimes I start reading the paper in the morning, then leave for subway. Now I can continue reading the same article on the reader.
ddavtian is offline   Reply With Quote
Old 02-23-2008, 12:15 AM   #188
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by ddavtian View Post
But with Kovid's help (and using profile from JTravers) I got my paper working. Now I'm getting all the articles from the print edition. It's not as nice as other profiles, simply lists all the articles by the page order (A1, A2..., B1, ..., etc.). Their feeds do not cover all articles from paper. Sometimes I start reading the paper in the morning, then leave for subway. Now I can continue reading the same article on the reader.
I'd love for you to post the profile, if you don't mind. I wanted to set the same kind of thing up on my own Reader but just didn't bother trying to do it since setting up feeds in web2lrf is so easy.

Thanks in advance!
JTravers is offline   Reply With Quote
Old 02-23-2008, 01:47 AM   #189
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
I've used your profile, only changed the parsing part to this new parse_feeds.
I'm using the run time, didn't get the correct date from the page. Size is half of main WSJ profile (around 2Mb). Feel free to improve and post to libprc.

Here is the method (I don't know how to post correctly, all indentation is gone):

Code:
	def parse_feeds(self):
		src = self.browser.open('http://online.wsj.com/page/2_0133.html').read()
		soup = BeautifulSoup(src)
		issue_date = time.ctime()
		
		articles = []
		for item in soup.findAll('a', attrs={'class':'bold80'}):
			url = item['href']
			url = 'http://online.wsj.com'+url.replace('/article', '/article_print')
			title = self.tag_to_string(item)
			articles.append({
				'title':title, 'url':url, 'description':'', 'date':issue_date
				})
               
    
		return {'Todays Paper' : articles }
ddavtian is offline   Reply With Quote
Old 02-25-2008, 06:20 PM   #190
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by ddavtian View Post
I've used your profile, only changed the parsing part to this new parse_feeds.
I'm using the run time, didn't get the correct date from the page. Size is half of main WSJ profile (around 2Mb). Feel free to improve and post to libprc.
Thanks!
I'll take a look at it when I get some free time.
JTravers is offline   Reply With Quote
Old 03-12-2008, 04:22 PM   #191
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I'm in the process of refactoring web2lrf to make it much more powerful and easier to use (impossible, I know). Here's an example of how it is more powerful, see the attached Newsweek ebook (downloaded using multi-threading in 10mins).

Feedback on the formatting and anything else is appreciated.
Attached Files
File Type: lrf newsweek.lrf (1.76 MB, 300 views)
kovidgoyal is online now   Reply With Quote
Old 03-12-2008, 05:34 PM   #192
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by kovidgoyal View Post
Feedback on the formatting and anything else is appreciated.
I really like the addition of the navigation panel at the beginning of each article, but do you think it would be possible for the 'next' link to come first, at least in the link-selection sequence? This might break the "traditional" order of navigation elements, but being able to skip through the articles with one button-press per article would greatly facilitate skimming (which at least for me would be the most common nav. panel use case).
llasram is offline   Reply With Quote
Old 03-12-2008, 06:00 PM   #193
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yeah that's a good idea.
kovidgoyal is online now   Reply With Quote
Old 03-12-2008, 08:55 PM   #194
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
This looks great.

And I like the idea for the "next" link.

Now the question: when it will be ready? :-)
ddavtian is offline   Reply With Quote
Old 03-12-2008, 09:19 PM   #195
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well the code for it has already been committed to svn. Just needs testing, integration. It probably wont reach the GUI for at least a couple more releases.
kovidgoyal is online now   Reply With Quote
Reply

Tags
libprs500, web2lrf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
web2lrf to capture blog archive? Deputy-Dawg Sony Reader Dev Corner 1 02-14-2008 11:41 PM
web2lrf: La Repubblica alexxxm Sony Reader 1 11-13-2007 12:27 PM


All times are GMT -4. The time now is 02:03 AM.


MobileRead.com is a privately owned, operated and funded community.