Dear Group,
Thanks so much for a) existing b) reading this post at all and c) having patience with me.
I hope I'm not duplicating this request. I did try a good few searches but in the end decided to join the community and ask.
I'd like to make an ebook from a url where the page is grabbed but where it also follows the links (e.g.
http://markforster.squarespace.com/b...e-systems.html or
http://www.psychowith6.com/can-a-dai....Z8UQS2kE.dpbs)
I know I can do this via ebook-convert, but what I'm keen to do is to try and do it via a recipe so that I can use the readability aspects and have it so the ebook only contains the 'body'.
I know a
little python, and next to nothing in html, but I'm keen to try (for the achievement if nothing else). I'm aware/have had a once through of these links:
https://www.mobileread.com/forums/sho...d.php?t=121439,
http://blog.calibre-ebook.com/2011/1...-fetching.html,
http://manual.calibre-ebook.com/news...asicNewsRecipe,
http://manual.calibre-ebook.com/news...-fetch-process.
I think the key API methods are: extract_readable_article(html, url), is_link_wanted(url, tag) or the regexp options for tags, parse_index(), auto_cleanup (maybe? I think that's just for feeds?) and recursions = X so it follows links.
I've made a basic start that doesn't throw errors but does little else (and index.html is downloaded) but I'm lost after that. Things like if I use extract_readable_article - can I assume the html, url are somehow already known or is that up to me?
Any help or pointers appreciated.
Kind regards,
Tim