08-24-2011, 01:25 PM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
|
Readability patch for recipes
I made a patch for calibre which adds the readability algorithm for extracting the main article content from an HTML page.
It's useful for things like a Hacker News recipe (included). The python readability port is from https://github.com/buriy/python-readability, which I dumped into src/readability. You can grab the branch at lp:~thomas-scholl/calibre/readability, and give the hackernews.recipe a try. What would I need to change to get it into Calibre? |
08-24-2011, 01:31 PM | #2 |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
|
Quick addition: The lib depends on lxml, and chardet (which I changed to calibre.ebooks.chardet)
|
Advert | |
|
08-24-2011, 02:06 PM | #3 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Looks generally ok. Can you open a bug report for it and I will look at it in more detail and comment when I have the time.
|
08-24-2011, 02:27 PM | #4 |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
|
Sure, thanks!
|
08-28-2011, 09:33 AM | #5 |
Junior Member
Posts: 1
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
|
I propose for this to be used in the 'Read It Later' recipe since it also is a list to different sites. :-)
PS/Edit: Did you already check how lxml reacts to malformed html? |
Advert | |
|
08-30-2011, 06:49 PM | #6 |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
|
I hadn't properly tested it against bad html - just ran it against lots of random websites.
Kovid made several improvements for 0.8.16 though, and also added the auto_cleanup flag which is probably all you'd need to set for 'Read It Later'. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Free (K/N/S/iBooks) RV Cooking Cookbook (Gooseberry Patch Classics) | arcadata | Deals and Resources (No Self-Promotion or Affiliate Links) | 1 | 05-18-2012 06:05 PM |
Readability of PDF>MOBI in Kindle | alfordsteven | Calibre | 1 | 06-29-2011 02:47 PM |
Use readability technology to fetch news | xXxXxXxXxXx | Calibre | 2 | 04-10-2011 10:28 AM |
Color of the reader and readability ? | sebastienbillard | Which one should I buy? | 3 | 10-30-2009 11:49 AM |
Looking for durability and readability under $400 | insomniac | Which one should I buy? | 32 | 07-05-2009 12:21 PM |