MobileRead Forums - View Single Post - Download url and links by recipe so readability version made

thorgan · 11-29-2015, 06:39 PM

Dear Group,

Thanks so much for a) existing b) reading this post at all and c) having patience with me.

I hope I'm not duplicating this request. I did try a good few searches but in the end decided to join the community and ask.

I'd like to make an ebook from a url where the page is grabbed but where it also follows the links (e.g. http://markforster.squarespace.com/b...e-systems.html or http://www.psychowith6.com/can-a-dai....Z8UQS2kE.dpbs)

I know I can do this via ebook-convert, but what I'm keen to do is to try and do it via a recipe so that I can use the readability aspects and have it so the ebook only contains the 'body'.

I know a little python, and next to nothing in html, but I'm keen to try (for the achievement if nothing else). I'm aware/have had a once through of these links: https://www.mobileread.com/forums/sho...d.php?t=121439, http://blog.calibre-ebook.com/2011/1...-fetching.html, http://manual.calibre-ebook.com/news...asicNewsRecipe, http://manual.calibre-ebook.com/news...-fetch-process.

I think the key API methods are: extract_readable_article(html, url), is_link_wanted(url, tag) or the regexp options for tags, parse_index(), auto_cleanup (maybe? I think that's just for feeds?) and recursions = X so it follows links.

I've made a basic start that doesn't throw errors but does little else (and index.html is downloaded) but I'm lost after that. Things like if I use extract_readable_article - can I assume the html, url are somehow already known or is that up to me?

Any help or pointers appreciated.

Kind regards,
Tim

11-29-2015, 06:39 PM	#1
thorgan Junior Member Posts: 2 Karma: 10 Join Date: Nov 2015 Device: Kindle	Download url and links by recipe so readability version made Dear Group, Thanks so much for a) existing b) reading this post at all and c) having patience with me. I hope I'm not duplicating this request. I did try a good few searches but in the end decided to join the community and ask. I'd like to make an ebook from a url where the page is grabbed but where it also follows the links (e.g. http://markforster.squarespace.com/b...e-systems.html or http://www.psychowith6.com/can-a-dai....Z8UQS2kE.dpbs) I know I can do this via ebook-convert, but what I'm keen to do is to try and do it via a recipe so that I can use the readability aspects and have it so the ebook only contains the 'body'. I know a little python, and next to nothing in html, but I'm keen to try (for the achievement if nothing else). I'm aware/have had a once through of these links: https://www.mobileread.com/forums/sho...d.php?t=121439, http://blog.calibre-ebook.com/2011/1...-fetching.html, http://manual.calibre-ebook.com/news...asicNewsRecipe, http://manual.calibre-ebook.com/news...-fetch-process. I think the key API methods are: extract_readable_article(html, url), is_link_wanted(url, tag) or the regexp options for tags, parse_index(), auto_cleanup (maybe? I think that's just for feeds?) and recursions = X so it follows links. I've made a basic start that doesn't throw errors but does little else (and index.html is downloaded) but I'm lost after that. Things like if I use extract_readable_article - can I assume the html, url are somehow already known or is that up to me? Any help or pointers appreciated. Kind regards, Tim