11-29-2015, 05:39 PM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Nov 2015
Device: Kindle
|
Download url and links by recipe so readability version made
Dear Group,
Thanks so much for a) existing b) reading this post at all and c) having patience with me. I hope I'm not duplicating this request. I did try a good few searches but in the end decided to join the community and ask. I'd like to make an ebook from a url where the page is grabbed but where it also follows the links (e.g. http://markforster.squarespace.com/b...e-systems.html or http://www.psychowith6.com/can-a-dai....Z8UQS2kE.dpbs) I know I can do this via ebook-convert, but what I'm keen to do is to try and do it via a recipe so that I can use the readability aspects and have it so the ebook only contains the 'body'. I know a little python, and next to nothing in html, but I'm keen to try (for the achievement if nothing else). I'm aware/have had a once through of these links: https://www.mobileread.com/forums/sho...d.php?t=121439, http://blog.calibre-ebook.com/2011/1...-fetching.html, http://manual.calibre-ebook.com/news...asicNewsRecipe, http://manual.calibre-ebook.com/news...-fetch-process. I think the key API methods are: extract_readable_article(html, url), is_link_wanted(url, tag) or the regexp options for tags, parse_index(), auto_cleanup (maybe? I think that's just for feeds?) and recursions = X so it follows links. I've made a basic start that doesn't throw errors but does little else (and index.html is downloaded) but I'm lost after that. Things like if I use extract_readable_article - can I assume the html, url are somehow already known or is that up to me? Any help or pointers appreciated. Kind regards, Tim |
11-29-2015, 10:23 PM | #2 |
creator of calibre
Posts: 43,839
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You dont need readable_article.
First make sure auto_cleanup = False Then set recursions = 1 Then implement a dummy is_link_wanted that always returns True. Once you have the links being picked up, you can look into the cleanup tools the recipe system offers. |
Advert | |
|
12-08-2015, 08:00 AM | #3 |
Junior Member
Posts: 2
Karma: 10
Join Date: Nov 2015
Device: Kindle
|
Thank you Kovid,
Sorry I meant to reply sooner - I have spent a few evenings trying to work out what I'm doing based on your pointers. I'm getting somewhere (though in the end its not actually that many lines I've written - just trying to understand which functions I thought I needed) but I've more to go. Whether I get somewhere or get stuck, I'll post what I've got later so others can see and help (potential help to them). Thanks, Tim |
Tags |
extract_readable_article, links, webpage |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Touch Normal links becomes footnotes links in epub made with Calibre | il_mix | Kobo Reader | 15 | 08-10-2014 01:19 PM |
Opening URL links | BetterRed | Editor | 3 | 05-10-2014 02:37 AM |
Use links in Calibre comments with custom URL schemes (e.g., DEVONthink) on Mac OS X | Januz | Calibre | 2 | 01-26-2014 06:07 PM |
Request: recipe for Readability.com | mojofleur | Recipes | 2 | 08-10-2013 04:10 AM |
Simple download from rss url recipe | BloodOmen | Recipes | 0 | 02-16-2011 09:21 PM |