View Single Post
Old 11-29-2015, 05:39 PM   #1
thorgan
Junior Member
thorgan began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2015
Device: Kindle
Download url and links by recipe so readability version made

Dear Group,

Thanks so much for a) existing b) reading this post at all and c) having patience with me.

I hope I'm not duplicating this request. I did try a good few searches but in the end decided to join the community and ask.

I'd like to make an ebook from a url where the page is grabbed but where it also follows the links (e.g. http://markforster.squarespace.com/b...e-systems.html or http://www.psychowith6.com/can-a-dai....Z8UQS2kE.dpbs)

I know I can do this via ebook-convert, but what I'm keen to do is to try and do it via a recipe so that I can use the readability aspects and have it so the ebook only contains the 'body'.

I know a little python, and next to nothing in html, but I'm keen to try (for the achievement if nothing else). I'm aware/have had a once through of these links: https://www.mobileread.com/forums/sho...d.php?t=121439, http://blog.calibre-ebook.com/2011/1...-fetching.html, http://manual.calibre-ebook.com/news...asicNewsRecipe, http://manual.calibre-ebook.com/news...-fetch-process.

I think the key API methods are: extract_readable_article(html, url), is_link_wanted(url, tag) or the regexp options for tags, parse_index(), auto_cleanup (maybe? I think that's just for feeds?) and recursions = X so it follows links.

I've made a basic start that doesn't throw errors but does little else (and index.html is downloaded) but I'm lost after that. Things like if I use extract_readable_article - can I assume the html, url are somehow already known or is that up to me?

Any help or pointers appreciated.

Kind regards,
Tim
thorgan is offline   Reply With Quote