Converting web site to epub
Websites dynamically generated from a database (WordPress or any other such system) can be made to spit out a series of HTML fragment files. One for each page. Each such HTML fragment does not have the <HTML><HEAD> or <BODY> elements, but usually do retain all other HTML markup in the resulting fragments.
For each such fragment file a hacker can use bash sed perl awk or python to do custom things to selected markups or perhaps to do tricky things like convert all occurrences of newlines to a space, but to leave all occurrences of two consecutive newlines in place.
At that point you have a text file that can be manually cut and pasted into sigil, or it can be copied into OEBPS/Text. If copied into OEBPS/Text a manual zip -r my.epub . can make a file that can be loaded into sigil.
Now you have transferred a website into a first draft of an ebook inside sigil. Once inside sigil there will still be a lot of work to do. But a LOT of the work has already been done, semi-automatically.
I know this can be done because I have just done it. But my work is clunky and in too many cases hard-coded and a bit error prone.
Are there any well-written utilities out there already for doing this? That might be more flexible and perhaps less buggy than my quick take?
|