Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Other formats > LRF

Notices

Reply
 
Thread Tools Search this Thread
Old 04-28-2008, 01:37 AM   #316
Ben_B
Junior Member
Ben_B began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Apr 2008
Location: British Columbia, Canada
Device: Sony PRS-505
Thanks... I wasn't aware that this changed. This may take me awhile as I learn how to write "recipes". Tried making some quick changes using the new recipe format (BasicNewsRecipe), but I must be doing something wrong as I consistently receive the following error...

IndexError: list index out of range
Failed to perform job: Fetch news from The Globe and Mail
Detailed traceback:
Traceback (most recent call last):
File "parallel.py", line 139, in run_job
File "libprs500\ebooks\lrf\feeds\convert_from.pyo", line 40, in main
File "libprs500\web\feeds\main.pyo", line 134, in run_recipe
File "libprs500\web\feeds\news.pyo", line 466, in download
File "libprs500\web\feeds\news.pyo", line 603, in build_index
File "d:\temp\libprs500_0.4.49_r_7fws_recipes\recipe0.p y", line 39, in print_version
IndexError: list index out of range
Ben_B is offline   Reply With Quote
Old 05-03-2008, 11:01 PM   #317
Bubble
Enthusiast
Bubble has a complete set of Star Wars action figures.Bubble has a complete set of Star Wars action figures.Bubble has a complete set of Star Wars action figures.
 
Posts: 32
Karma: 274
Join Date: Apr 2008
Device: Sony Reader PRS-500
Hope you guys updated to the newest version! Globe n Mail is now supported in calibre. I have not looked at it in details yet however due to other priorities.

Thanks kovidgoyal.
Bubble is offline   Reply With Quote
Advert
Old 05-08-2008, 03:46 PM   #318
moneytoo
Enthusiast
moneytoo began at the beginning.
 
Posts: 39
Karma: 20
Join Date: Oct 2007
Location: Czech Republic
Device: Sony PRS-505
Code:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 113: ordinal not in range(128)
Failed to perform job: Fetch news from Reuters
Detailed traceback:
Traceback (most recent call last):
  File "parallel.py", line 139, in run_job
  File "calibre\ebooks\lrf\feeds\convert_from.pyo", line 40, in main
  File "calibre\web\feeds\main.pyo", line 128, in run_recipe
  File "calibre\web\feeds\news.pyo", line 810, in __init__
  File "calibre\ebooks\lrf\web\profiles\__init__.pyo", line 174, in __init__
  File "calibre\ebooks\lrf\web\profiles\__init__.pyo", line 225, in build_index
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 113: ordinal not in range(128)
Log:
Fetching feeds...
I cannot convert single news feed using calibre GUI nor web2lrf. Every time I get this UnicodeDecodeError no matter what site it parses.
moneytoo is offline   Reply With Quote
Old 05-08-2008, 05:27 PM   #319
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Try the next release, it has a possible fix for this. It should be out in a couple of days.
kovidgoyal is online now   Reply With Quote
Old 05-09-2008, 12:48 AM   #320
Rick C
Seeker
Rick C has a complete set of Star Wars action figures.Rick C has a complete set of Star Wars action figures.Rick C has a complete set of Star Wars action figures.Rick C has a complete set of Star Wars action figures.
 
Rick C's Avatar
 
Posts: 53
Karma: 363
Join Date: Mar 2008
Location: Ontario, Canada
Device: Sony PRS-505
I have been using v4.51 for a couple of days and the Globe feed is working well for me, athough it only retrieves the first page of any given story.
Rick C is offline   Reply With Quote
Advert
Old 05-09-2008, 01:17 PM   #321
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That's probably because it needs a subscription, which I don't have. I actually wrote that recipe as a guide for Bubble, in the hopes he'd improve it and share the result.
kovidgoyal is online now   Reply With Quote
Old 05-10-2008, 12:48 AM   #322
Bubble
Enthusiast
Bubble has a complete set of Star Wars action figures.Bubble has a complete set of Star Wars action figures.Bubble has a complete set of Star Wars action figures.
 
Posts: 32
Karma: 274
Join Date: Apr 2008
Device: Sony Reader PRS-500
I notice that too Rick C when I finally got around to test it.

The link that I had for Globe and Mail profile is broken (from private message). The online helpfile for web2lrf also point to a broken link when attempting to browse the default profiles. When you have the time, could you please take a look at it kovidgoyal?

I still have a faint image of the profile when I first saw it. To be honest, the codes are way above my understanding at this point in time. As such, I doubt I can tweak it to perfection... But maybe Ben_B can?
Bubble is offline   Reply With Quote
Old 05-10-2008, 01:18 AM   #323
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Fixed the links.
kovidgoyal is online now   Reply With Quote
Old 05-22-2008, 01:31 AM   #324
Ben_B
Junior Member
Ben_B began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Apr 2008
Location: British Columbia, Canada
Device: Sony PRS-505
As for the links to the full stories from the Globe and Mail, I was using the following function to retrieve the full stories from the Globe Investor web site in the profile I posted earlier. The Globe Investor produces a very nice printed version without any extra HTML. I was using the function to created printed versions of the news stories from the Globe and Mail RSS feeds (i.e., http://www.theglobeandmail.com/gener...s/BN/Front.xml).

def print_version(self, url):
return 'http://www.globeinvestor.com/servlet/ArticleNews/print/' + (url.split('/story/',1)[1]).split('.',1)[0] + '/' + url.rsplit('.',3)[2] + '/' + url.rsplit('.',3)[3]

The problem I ran into is that most of the full stories are contained within the tag <feedburnerrigLink>. With the old libprs500, I was usng url_search_order = ['feedburnerriglink']. This seemed to work; however, this variable no longer seems to exist in Calibre's Basic News Recipe. I can't seem to figure out how to make Calibre follow the links contained within the <feedburnerrigLink> tags. I'm guessing I will need to process this somehow through another function?
Ben_B is offline   Reply With Quote
Old 05-22-2008, 11:44 AM   #325
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yeah
Code:
   def get_article_url(self, article):
        return article.get('feedburner_origlink', None)
kovidgoyal is online now   Reply With Quote
Old 05-23-2008, 02:41 PM   #326
Ben_B
Junior Member
Ben_B began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Apr 2008
Location: British Columbia, Canada
Device: Sony PRS-505
thanks that works

Here is my personal profile for the Globe and Mail I use for my PRS-505. I'm not a coder so there is probably plenty of room for improvement. The only problem I have is that I cannot change the text size while viewing it on the Reader. When opening the e-book file, the Reader defaults to S sized text. Attempting to change the size to M or L causes my Reader to crash and restart. My firmware is ver. 1.0.00.08130.

Code:
import re

from calibre.web.feeds.news import BasicNewsRecipe

class GlobeMail(BasicNewsRecipe): 

	title = 'The Globe and Mail' 
	html_description = False
	use_pubdate = True
	oldest_article = 7
	use_embedded_content = False
	max_articles_per_feed = 10
	simultaneous_downloads = 1
	no_stylesheets = True
	summary_length = 300
	html2lrf_options = ['--base-font-size', '9'] 

	preprocess_regexps =  [
		
		(re.compile(r'<script.*?</script>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
		(re.compile(r'<style.*?</style>', re.IGNORECASE | re.DOTALL), lambda match : '<style> </style>'),
		(re.compile(r'<body class="subscribe.*?<div id="articleAbstract">', re.IGNORECASE | re.DOTALL), lambda match : '<body><div>'),
		(re.compile(r'<ul class="columnistInfo">.*?</ul>', re.IGNORECASE | re.DOTALL), lambda match : ''),
		(re.compile(r'<p class="note".*?</body>', re.IGNORECASE | re.DOTALL), lambda match : '<br><br>Subscription required to read full story</body>'),
		(re.compile(r'<p class="deck"></p>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
		(re.compile(r'<p class="byline"></p>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
		(re.compile(r'<p class="date"></p>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
		(re.compile(r'<p><a href="http://www.globeinvestor.com/">.*?<h2', re.IGNORECASE | re.DOTALL), lambda match : '<h2'),
		(re.compile(r'<h1 class="keyline">.*?</h1>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
		(re.compile(r'<p class="date">.*?<(\S+)>', re.IGNORECASE | re.DOTALL), lambda match : match.group().replace(match.group(1), '/p><br') ),
		(re.compile(r'<a href.*? target="offsite">', re.IGNORECASE | re.DOTALL), lambda match : '<a name="#">'),
		(re.compile(r'<tr>', re.IGNORECASE | re.DOTALL), lambda match : '<br>'),
		(re.compile(r'<td>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
		(re.compile(r'</tr>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
		(re.compile(r'</td>', re.IGNORECASE | re.DOTALL), lambda match : '  '),
		(re.compile(r'<hr>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
		(re.compile(r'<!-- /frag.../copyright begins -->', re.IGNORECASE | re.DOTALL), lambda match : '<br><!-- /frag.../copyright begins --><br>'),
		]

	def get_article_url(self, article):
		return article.get('feedburner_origlink', article.link)

	def print_version(self, url): 
		return 'http://www.globeinvestor.com/servlet/ArticleNews/print/' + (url.split('/story/',1)[1]).split('.',1)[0] + '/' + url.rsplit('.',3)[2] + '/' + url.rsplit('.',3)[3]

	def get_feeds(self):
		return [
		('  A. Front Page', 'http://www.theglobeandmail.com/generated/rss/BN/Front.xml'),
		('  B. British Columbia', 'http://www.theglobeandmail.com/generated/rss/BN/HYBritishColumbia.xml'),
		('  C. National', 'http://www.theglobeandmail.com/generated/rss/BN/National.xml'),
		('  D. World', 'http://www.theglobeandmail.com/generated/rss/BN/International.xml'),
		('  E. Americas', 'http://www.theglobeandmail.com/generated/rss/BN/HYAmerica.xml'),
		('  F. Report on Business', 'http://www.theglobeandmail.com/generated/rss/BN/Business.xml'),
		('  G. Energy News', 'http://www.theglobeandmail.com/generated/rss/BN/energy.xml'),
		('  H. Your Money', 'http://www.theglobeandmail.com/generated/rss/BN/SpecialEvents2.xml'),
		('  I. Sports', 'http://www.theglobeandmail.com/generated/rss/BN/Sports.xml'),
		('  J. The Arts', 'http://www.theglobeandmail.com/generated/rss/BN/Entertainment.xml'),
		('  K. Movies', 'http://www.theglobeandmail.com/generated/rss/BN/HYMovies.xml'),
		('  L. Music', 'http://www.theglobeandmail.com/generated/rss/BN/HYMusic.xml'),
		('  M. Technology', 'http://www.theglobeandmail.com/generated/rss/BN/Technology.xml'),
		('  N. Science', 'http://www.theglobeandmail.com/generated/rss/BN/Science.xml'),
		('  O. Life', 'http://www.theglobeandmail.com/generated/rss/BN/lifeMain.xml'),
		('  P. Food & Wine', 'http://www.theglobeandmail.com/generated/rss/BN/lifeFoodWine.xml'),
		('  Q. Travel', 'http://www.theglobeandmail.com/generated/rss/BN/specialTravel.xml'),
		('  R. Health', 'http://www.theglobeandmail.com/generated/rss/BN/specialScienceandHealth.xml'),
		]
Ben_B is offline   Reply With Quote
Old 05-23-2008, 02:50 PM   #327
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
yeah the font size thing is a bug in SONY's firmware, which hopefully they will fix. Are the articles the full length ones? Or do you need a subscription for that?
kovidgoyal is online now   Reply With Quote
Old 05-23-2008, 03:19 PM   #328
Ben_B
Junior Member
Ben_B began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Apr 2008
Location: British Columbia, Canada
Device: Sony PRS-505
I'd say at least 90% of the articles are full-length. Most of the subscription articles are movie or restaurant reviews. I did a quick review of the articles I downloaded this morning...

A Front Page = 9/9 are full length
B British Columbia = 8/10 full length
C National = 10/10 full length
D World = 10/10 full length
E Americas = 10/10 full length

I didn't go through the rest, but I do recall seeing a couple more subscription articles under Movies.
Ben_B is offline   Reply With Quote
Old 05-30-2008, 08:18 AM   #329
moneytoo
Enthusiast
moneytoo began at the beginning.
 
Posts: 39
Karma: 20
Join Date: Oct 2007
Location: Czech Republic
Device: Sony PRS-505
I have waited few weeks and downloaded latest version of calibre today. Just tried fetching few feeds but most of them just doesnt work...

Code:
Associated Press		UnicodeDecodeError
The Atlantic			OK
The BBC			OK
Business Week			URLError
CNN				UnicodeDecodeError
Christian Science Monitor	UnicodeDecodeError
Die Zeit Nachrichten		UnicodeDecodeError
The Economist			OK
FAZ NET			UnicodeDecodeError
Globe and Mail			OK
Jerusalem Post			UnicodeDecodeError
Jutarnji				UnicodeDecodeError
NASA				UnicodeDecodeError
New York Review of Books	UnicodeDecodeError
The New Yorker			UnicodeDecodeError
Newsweek			OK
Outlook Inida			OK
Portfolio			OK
Reuters				UnicodeDecodeError
Spiegel Online			UnicodeDecodeError
Syndey Morning Herald		OK
USA Today			OK
United Press International	UnicodeDecodeError
Washington Post		UnicodeDecodeError
Wired.com			OK
Unfortunately I still have difficulties converting sites using web2lrf...

Code:
c:\Program Files\calibre>web2lrf -u http://www.mobilmania.mobi -r 1 default
Downloading
. . .Could not fetch stylesheet http://klub.zive.cz/passport/ /Client.StyleSheet
s/common.css
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .

http://www.mobilmania.mobi saved to c:\docume~1\marcel~1\locals~1\temp\calibre_w
seyry_web2lrf\index.html
Traceback (most recent call last):
  File "convert_from.py", line 182, in <module>
  File "convert_from.py", line 176, in main
  File "convert_from.py", line 146, in process_profile
  File "ntpath.pyo", line 102, in join
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 19: ordinal
 not in range(128)
moneytoo is offline   Reply With Quote
Old 05-30-2008, 11:34 AM   #330
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I assume you're using a localized (non-english) version of windows?
kovidgoyal is online now   Reply With Quote
Reply

Tags
libprs500, web2lrf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
web2lrf to capture blog archive? Deputy-Dawg Sony Reader Dev Corner 1 02-14-2008 11:41 PM
web2lrf: La Repubblica alexxxm Sony Reader 1 11-13-2007 12:27 PM


All times are GMT -4. The time now is 03:59 AM.


MobileRead.com is a privately owned, operated and funded community.