04-28-2008, 01:37 AM | #316 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2008
Location: British Columbia, Canada
Device: Sony PRS-505
|
Thanks... I wasn't aware that this changed. This may take me awhile as I learn how to write "recipes". Tried making some quick changes using the new recipe format (BasicNewsRecipe), but I must be doing something wrong as I consistently receive the following error...
IndexError: list index out of range Failed to perform job: Fetch news from The Globe and Mail Detailed traceback: Traceback (most recent call last): File "parallel.py", line 139, in run_job File "libprs500\ebooks\lrf\feeds\convert_from.pyo", line 40, in main File "libprs500\web\feeds\main.pyo", line 134, in run_recipe File "libprs500\web\feeds\news.pyo", line 466, in download File "libprs500\web\feeds\news.pyo", line 603, in build_index File "d:\temp\libprs500_0.4.49_r_7fws_recipes\recipe0.p y", line 39, in print_version IndexError: list index out of range |
05-03-2008, 11:01 PM | #317 |
Enthusiast
Posts: 32
Karma: 274
Join Date: Apr 2008
Device: Sony Reader PRS-500
|
Hope you guys updated to the newest version! Globe n Mail is now supported in calibre. I have not looked at it in details yet however due to other priorities.
Thanks kovidgoyal. |
Advert | |
|
05-08-2008, 03:46 PM | #318 |
Enthusiast
Posts: 39
Karma: 20
Join Date: Oct 2007
Location: Czech Republic
Device: Sony PRS-505
|
Code:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 113: ordinal not in range(128) Failed to perform job: Fetch news from Reuters Detailed traceback: Traceback (most recent call last): File "parallel.py", line 139, in run_job File "calibre\ebooks\lrf\feeds\convert_from.pyo", line 40, in main File "calibre\web\feeds\main.pyo", line 128, in run_recipe File "calibre\web\feeds\news.pyo", line 810, in __init__ File "calibre\ebooks\lrf\web\profiles\__init__.pyo", line 174, in __init__ File "calibre\ebooks\lrf\web\profiles\__init__.pyo", line 225, in build_index UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 113: ordinal not in range(128) Log: Fetching feeds... |
05-08-2008, 05:27 PM | #319 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Try the next release, it has a possible fix for this. It should be out in a couple of days.
|
05-09-2008, 12:48 AM | #320 |
Seeker
Posts: 53
Karma: 363
Join Date: Mar 2008
Location: Ontario, Canada
Device: Sony PRS-505
|
I have been using v4.51 for a couple of days and the Globe feed is working well for me, athough it only retrieves the first page of any given story.
|
Advert | |
|
05-09-2008, 01:17 PM | #321 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That's probably because it needs a subscription, which I don't have. I actually wrote that recipe as a guide for Bubble, in the hopes he'd improve it and share the result.
|
05-10-2008, 12:48 AM | #322 |
Enthusiast
Posts: 32
Karma: 274
Join Date: Apr 2008
Device: Sony Reader PRS-500
|
I notice that too Rick C when I finally got around to test it.
The link that I had for Globe and Mail profile is broken (from private message). The online helpfile for web2lrf also point to a broken link when attempting to browse the default profiles. When you have the time, could you please take a look at it kovidgoyal? I still have a faint image of the profile when I first saw it. To be honest, the codes are way above my understanding at this point in time. As such, I doubt I can tweak it to perfection... But maybe Ben_B can? |
05-10-2008, 01:18 AM | #323 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Fixed the links.
|
05-22-2008, 01:31 AM | #324 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2008
Location: British Columbia, Canada
Device: Sony PRS-505
|
As for the links to the full stories from the Globe and Mail, I was using the following function to retrieve the full stories from the Globe Investor web site in the profile I posted earlier. The Globe Investor produces a very nice printed version without any extra HTML. I was using the function to created printed versions of the news stories from the Globe and Mail RSS feeds (i.e., http://www.theglobeandmail.com/gener...s/BN/Front.xml).
def print_version(self, url): return 'http://www.globeinvestor.com/servlet/ArticleNews/print/' + (url.split('/story/',1)[1]).split('.',1)[0] + '/' + url.rsplit('.',3)[2] + '/' + url.rsplit('.',3)[3] The problem I ran into is that most of the full stories are contained within the tag <feedburnerrigLink>. With the old libprs500, I was usng url_search_order = ['feedburnerriglink']. This seemed to work; however, this variable no longer seems to exist in Calibre's Basic News Recipe. I can't seem to figure out how to make Calibre follow the links contained within the <feedburnerrigLink> tags. I'm guessing I will need to process this somehow through another function? |
05-22-2008, 11:44 AM | #325 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yeah
Code:
def get_article_url(self, article): return article.get('feedburner_origlink', None) |
05-23-2008, 02:41 PM | #326 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2008
Location: British Columbia, Canada
Device: Sony PRS-505
|
thanks that works
Here is my personal profile for the Globe and Mail I use for my PRS-505. I'm not a coder so there is probably plenty of room for improvement. The only problem I have is that I cannot change the text size while viewing it on the Reader. When opening the e-book file, the Reader defaults to S sized text. Attempting to change the size to M or L causes my Reader to crash and restart. My firmware is ver. 1.0.00.08130.
Code:
import re from calibre.web.feeds.news import BasicNewsRecipe class GlobeMail(BasicNewsRecipe): title = 'The Globe and Mail' html_description = False use_pubdate = True oldest_article = 7 use_embedded_content = False max_articles_per_feed = 10 simultaneous_downloads = 1 no_stylesheets = True summary_length = 300 html2lrf_options = ['--base-font-size', '9'] preprocess_regexps = [ (re.compile(r'<script.*?</script>', re.IGNORECASE | re.DOTALL), lambda match : ' '), (re.compile(r'<style.*?</style>', re.IGNORECASE | re.DOTALL), lambda match : '<style> </style>'), (re.compile(r'<body class="subscribe.*?<div id="articleAbstract">', re.IGNORECASE | re.DOTALL), lambda match : '<body><div>'), (re.compile(r'<ul class="columnistInfo">.*?</ul>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'<p class="note".*?</body>', re.IGNORECASE | re.DOTALL), lambda match : '<br><br>Subscription required to read full story</body>'), (re.compile(r'<p class="deck"></p>', re.IGNORECASE | re.DOTALL), lambda match : ' '), (re.compile(r'<p class="byline"></p>', re.IGNORECASE | re.DOTALL), lambda match : ' '), (re.compile(r'<p class="date"></p>', re.IGNORECASE | re.DOTALL), lambda match : ' '), (re.compile(r'<p><a href="http://www.globeinvestor.com/">.*?<h2', re.IGNORECASE | re.DOTALL), lambda match : '<h2'), (re.compile(r'<h1 class="keyline">.*?</h1>', re.IGNORECASE | re.DOTALL), lambda match : ' '), (re.compile(r'<p class="date">.*?<(\S+)>', re.IGNORECASE | re.DOTALL), lambda match : match.group().replace(match.group(1), '/p><br') ), (re.compile(r'<a href.*? target="offsite">', re.IGNORECASE | re.DOTALL), lambda match : '<a name="#">'), (re.compile(r'<tr>', re.IGNORECASE | re.DOTALL), lambda match : '<br>'), (re.compile(r'<td>', re.IGNORECASE | re.DOTALL), lambda match : ' '), (re.compile(r'</tr>', re.IGNORECASE | re.DOTALL), lambda match : ' '), (re.compile(r'</td>', re.IGNORECASE | re.DOTALL), lambda match : ' '), (re.compile(r'<hr>', re.IGNORECASE | re.DOTALL), lambda match : ' '), (re.compile(r'<!-- /frag.../copyright begins -->', re.IGNORECASE | re.DOTALL), lambda match : '<br><!-- /frag.../copyright begins --><br>'), ] def get_article_url(self, article): return article.get('feedburner_origlink', article.link) def print_version(self, url): return 'http://www.globeinvestor.com/servlet/ArticleNews/print/' + (url.split('/story/',1)[1]).split('.',1)[0] + '/' + url.rsplit('.',3)[2] + '/' + url.rsplit('.',3)[3] def get_feeds(self): return [ (' A. Front Page', 'http://www.theglobeandmail.com/generated/rss/BN/Front.xml'), (' B. British Columbia', 'http://www.theglobeandmail.com/generated/rss/BN/HYBritishColumbia.xml'), (' C. National', 'http://www.theglobeandmail.com/generated/rss/BN/National.xml'), (' D. World', 'http://www.theglobeandmail.com/generated/rss/BN/International.xml'), (' E. Americas', 'http://www.theglobeandmail.com/generated/rss/BN/HYAmerica.xml'), (' F. Report on Business', 'http://www.theglobeandmail.com/generated/rss/BN/Business.xml'), (' G. Energy News', 'http://www.theglobeandmail.com/generated/rss/BN/energy.xml'), (' H. Your Money', 'http://www.theglobeandmail.com/generated/rss/BN/SpecialEvents2.xml'), (' I. Sports', 'http://www.theglobeandmail.com/generated/rss/BN/Sports.xml'), (' J. The Arts', 'http://www.theglobeandmail.com/generated/rss/BN/Entertainment.xml'), (' K. Movies', 'http://www.theglobeandmail.com/generated/rss/BN/HYMovies.xml'), (' L. Music', 'http://www.theglobeandmail.com/generated/rss/BN/HYMusic.xml'), (' M. Technology', 'http://www.theglobeandmail.com/generated/rss/BN/Technology.xml'), (' N. Science', 'http://www.theglobeandmail.com/generated/rss/BN/Science.xml'), (' O. Life', 'http://www.theglobeandmail.com/generated/rss/BN/lifeMain.xml'), (' P. Food & Wine', 'http://www.theglobeandmail.com/generated/rss/BN/lifeFoodWine.xml'), (' Q. Travel', 'http://www.theglobeandmail.com/generated/rss/BN/specialTravel.xml'), (' R. Health', 'http://www.theglobeandmail.com/generated/rss/BN/specialScienceandHealth.xml'), ] |
05-23-2008, 02:50 PM | #327 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yeah the font size thing is a bug in SONY's firmware, which hopefully they will fix. Are the articles the full length ones? Or do you need a subscription for that?
|
05-23-2008, 03:19 PM | #328 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2008
Location: British Columbia, Canada
Device: Sony PRS-505
|
I'd say at least 90% of the articles are full-length. Most of the subscription articles are movie or restaurant reviews. I did a quick review of the articles I downloaded this morning...
A Front Page = 9/9 are full length B British Columbia = 8/10 full length C National = 10/10 full length D World = 10/10 full length E Americas = 10/10 full length I didn't go through the rest, but I do recall seeing a couple more subscription articles under Movies. |
05-30-2008, 08:18 AM | #329 |
Enthusiast
Posts: 39
Karma: 20
Join Date: Oct 2007
Location: Czech Republic
Device: Sony PRS-505
|
I have waited few weeks and downloaded latest version of calibre today. Just tried fetching few feeds but most of them just doesnt work...
Code:
Associated Press UnicodeDecodeError The Atlantic OK The BBC OK Business Week URLError CNN UnicodeDecodeError Christian Science Monitor UnicodeDecodeError Die Zeit Nachrichten UnicodeDecodeError The Economist OK FAZ NET UnicodeDecodeError Globe and Mail OK Jerusalem Post UnicodeDecodeError Jutarnji UnicodeDecodeError NASA UnicodeDecodeError New York Review of Books UnicodeDecodeError The New Yorker UnicodeDecodeError Newsweek OK Outlook Inida OK Portfolio OK Reuters UnicodeDecodeError Spiegel Online UnicodeDecodeError Syndey Morning Herald OK USA Today OK United Press International UnicodeDecodeError Washington Post UnicodeDecodeError Wired.com OK Code:
c:\Program Files\calibre>web2lrf -u http://www.mobilmania.mobi -r 1 default Downloading . . .Could not fetch stylesheet http://klub.zive.cz/passport/ /Client.StyleSheet s/common.css . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . http://www.mobilmania.mobi saved to c:\docume~1\marcel~1\locals~1\temp\calibre_w seyry_web2lrf\index.html Traceback (most recent call last): File "convert_from.py", line 182, in <module> File "convert_from.py", line 176, in main File "convert_from.py", line 146, in process_profile File "ntpath.pyo", line 102, in join UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 19: ordinal not in range(128) |
05-30-2008, 11:34 AM | #330 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I assume you're using a localized (non-english) version of windows?
|
Tags |
libprs500, web2lrf |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
web2lrf to capture blog archive? | Deputy-Dawg | Sony Reader Dev Corner | 1 | 02-14-2008 11:41 PM |
web2lrf: La Repubblica | alexxxm | Sony Reader | 1 | 11-13-2007 12:27 PM |