03-01-2011, 11:15 AM | #1 |
Member
Posts: 10
Karma: 10
Join Date: Mar 2011
Device: Amazon Kindle 3G
|
LWN.net Weekly News recipe
Hi,
I've made a quick recipe to get the latest weekly edition of LWN.net on my kindle. It will fetch the latest current edition if you enter your lwn credentials and have a current subscription; failing that, it will get the latest free edition (usually a few weeks behind). I have some Python experience but this is my first try at a Calibre recipe, feel free to comment if you see a better way to do things. Cheers, Davide |
03-01-2011, 11:17 AM | #2 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There's already an lwn recipe in calibre, is your different?
|
03-01-2011, 11:37 AM | #3 |
Member
Posts: 10
Karma: 10
Join Date: Mar 2011
Device: Amazon Kindle 3G
|
Yes, the lwn recipe already in calibre downloads the latest news published on lwn.net homepage (i.e. their RSS feed). My recipe downloads the "weekly edition", which is a magazine-like publication made by the LWN.net folks. The weekly edition may include some content which is also syndicated on the lwn.net homepage/rss, but most of it is original content.
Davide |
03-01-2011, 11:54 AM | #4 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Ah, added
|
03-22-2011, 02:51 PM | #5 |
Junior Member
Posts: 8
Karma: 10
Join Date: Dec 2010
Device: Graphite Kindle DX
|
I am not having any luck getting this recipe to work. I have verified that I am getting the correct URL by sending the request through my proxy and checking the requested URL via curl.
I've run ebook-convert from the command-line (with and without '--test'): $ ebook-convert LWN.net\ Weekly\ Edition_6.recipe /tmp/lwn-calibre -d /tmp/lwn-calibre-debug -vv What's left in either directory, however, appears to be just the skeleton without content. Presumably it's getting destroyed by the processing. It would be nice if I could preserve the temp directory that it downloads the content into--the contents should be the same as the -d debug directory, but I have a feeling they aren't. (In trying to prevent the temp directory from being removed, I've gone so far as to try "os.kill(getpid(), 9)", but I due to forking/threading I'm not getting it in the right process or right time.) |
03-22-2011, 02:56 PM | #6 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Simply remove the remove_tags, keep_only_tags etc fields from the recipe.
And if you want to look at downloaded html, implement the preprocess_html method in the recipe and save the soup yourself to a temp files. |
03-22-2011, 03:47 PM | #7 | ||
Junior Member
Posts: 8
Karma: 10
Join Date: Dec 2010
Device: Graphite Kindle DX
|
Quote:
Quote:
pprinting 'ans' that is returned from parse_index, it seems that it hasn't found any sections or content: [('Front Page', [])] Ok, now that I'm actually digging into it and not just trying ad hoc debugging, I'm seeing that there are a number of problems. First, the article URL in the actual content is relative, so re.compile('^http://lwn.net/Articles/') should be re.compile('^(http://lwn.net)?/Articles/'). But then at the end of the loop it's setting content='' in the article dict, so 'http://lwn.net' has to be prepended to url. But that means that rather than just using the article text that's inline, we're re-downloading each article individually. Yuck. I'll submit an updated recipe when I have something satisfactory. |
||
03-22-2011, 04:59 PM | #8 |
Junior Member
Posts: 8
Karma: 10
Join Date: Dec 2010
Device: Graphite Kindle DX
|
Here is a version that works for me; I have tested only with the post-embargoed edition, although it should be the same. I would like to try to use only the one big page and split that into distinct articles but not right now.
https://github.com/wcooley/calibre_r..._weekly.recipe |
03-22-2011, 05:02 PM | #9 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
updated.
|
09-23-2011, 07:58 AM | #10 |
Junior Member
Posts: 1
Karma: 10
Join Date: Sep 2011
Device: Kobo Wireless
|
This lwn_weekly recipe suddenly stopped working for me last night, I had to edit it as follows in order to get it working:
Code:
--- lwn_weekly.recipe 2011-07-01 10:19:40.000000000 +0300 +++ lwn_weekly.recipe.new 2011-09-23 14:54:36.815730882 +0300 @@ -95,7 +95,7 @@ break article = dict( - title=tag_title.string, + title=tag_title.contents[0].string, url= 'http://lwn.net' + tag_url['href'].split('#')[0] + '?format=printable', description='', content='', date='') articles[section].append(article) I don't know what has changed to make it suddenly not work, it's possible that it's something on my system (any ideas what it could be?). But just in case it's something more widespread, I thought I'd share my experience. Also, I'm new to calibre recipes and BeautifulSoup, so very likely there's a better/more robust way to fix this that what I've done. Thanks! Dov Last edited by dovf; 09-23-2011 at 08:03 AM. |
09-23-2011, 11:30 AM | #11 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Thanks, the robust fix is to use
title = self.tag_to_string(tag_title) |
01-05-2012, 01:38 PM | #12 |
Zealot
Posts: 129
Karma: 567800
Join Date: Sep 2011
Location: Austria
Device: Kindle Paperwhite II
|
Hmm, looks like lwn broke the recipe again, I get
Conversion Error: <b>Failed</b>: Fetch news from LWN.net Weekly Edition [...] Exception: Could not find any articles. |
01-06-2012, 03:38 PM | #13 |
Junior Member
Posts: 8
Karma: 10
Join Date: Dec 2010
Device: Graphite Kindle DX
|
I have updated the recipe on GitHub:
https://github.com/wcooley/calibre_r..._weekly.recipe This includes the title fix mentioned above and a tag-title fix submitted by jerrykan. |
01-07-2012, 09:17 AM | #14 |
Zealot
Posts: 129
Karma: 567800
Join Date: Sep 2011
Location: Austria
Device: Kindle Paperwhite II
|
|
04-09-2012, 11:02 PM | #15 |
Junior Member
Posts: 1
Karma: 10
Join Date: Apr 2012
Device: Samsung Galazy Ace
|
Thanks a lot for the recipe! It will be a lot easier to read offline. I really appreciate.
The only thing missing would be the inclusion of the comments section. The comments on lwn are of high quality and always interesting to read. Is that possible? It might be harder to include though... Thanks again. |
Tags |
calibre, lwn, recipe |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe for KA-News.de | tfeld | Recipes | 0 | 12-30-2010 05:45 PM |
Request: Inquirer.net Recipe update | zoilom | Recipes | 0 | 12-21-2010 01:06 AM |
LWN article on OpenInkPot | wallcraft | OpenInkpot | 2 | 10-16-2009 12:01 PM |
News.com: Music, movie lobbyists push to spy on your Net traffic | Steven Lyle Jordan | News | 14 | 08-28-2008 03:13 AM |
Gentoo Weekly News, Handbooks in Plucker format | hacker | Workshop | 5 | 08-28-2007 09:39 AM |