Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 08-24-2011, 01:25 PM   #1
tomscholl
Junior Member
tomscholl began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
Readability patch for recipes

I made a patch for calibre which adds the readability algorithm for extracting the main article content from an HTML page.
It's useful for things like a Hacker News recipe (included).

The python readability port is from https://github.com/buriy/python-readability, which I dumped into src/readability.

You can grab the branch at lp:~thomas-scholl/calibre/readability, and give the hackernews.recipe a try.

What would I need to change to get it into Calibre?
tomscholl is offline   Reply With Quote
Old 08-24-2011, 01:31 PM   #2
tomscholl
Junior Member
tomscholl began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
Quick addition: The lib depends on lxml, and chardet (which I changed to calibre.ebooks.chardet)
tomscholl is offline   Reply With Quote
Advert
Old 08-24-2011, 02:06 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Looks generally ok. Can you open a bug report for it and I will look at it in more detail and comment when I have the time.
kovidgoyal is offline   Reply With Quote
Old 08-24-2011, 02:27 PM   #4
tomscholl
Junior Member
tomscholl began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
Sure, thanks!
tomscholl is offline   Reply With Quote
Old 08-28-2011, 09:33 AM   #5
denkr
Junior Member
denkr began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
I propose for this to be used in the 'Read It Later' recipe since it also is a list to different sites. :-)

PS/Edit: Did you already check how lxml reacts to malformed html?
denkr is offline   Reply With Quote
Advert
Old 08-30-2011, 06:49 PM   #6
tomscholl
Junior Member
tomscholl began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Kindle 3
I hadn't properly tested it against bad html - just ran it against lots of random websites.

Kovid made several improvements for 0.8.16 though, and also added the auto_cleanup flag which is probably all you'd need to set for 'Read It Later'.
tomscholl is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Free (K/N/S/iBooks) RV Cooking Cookbook (Gooseberry Patch Classics) arcadata Deals and Resources (No Self-Promotion or Affiliate Links) 1 05-18-2012 06:05 PM
Readability of PDF>MOBI in Kindle alfordsteven Calibre 1 06-29-2011 02:47 PM
Use readability technology to fetch news xXxXxXxXxXx Calibre 2 04-10-2011 10:28 AM
Color of the reader and readability ? sebastienbillard Which one should I buy? 3 10-30-2009 11:49 AM
Looking for durability and readability under $400 insomniac Which one should I buy? 32 07-05-2009 12:21 PM


All times are GMT -4. The time now is 09:12 AM.


MobileRead.com is a privately owned, operated and funded community.