The issue is parsing the html to get just the book and not all the fluff/ads. Those ads are likely what is causing the issues. Soup and the script can do all the scraping and 99% of the massaging to output a text file with all the book contents and associated html tags. Then just copy/paste the contents of the output file into pandoc/sigil/calibre for final epub massaging.
I wrote a program to do all that as a project to learn python and made a gui for it. That was fun! However, there aren’t any websites that I’m aware of which allow its use. You are pretty much restricted to converting your own webpage to an epub.
Last edited by Turtle91; 10-11-2024 at 05:52 AM.
|