View Single Post
Old 03-12-2018, 03:57 PM   #1
pittendrigh
Connoisseur
pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.pittendrigh ought to be getting tired of karma fortunes by now.
 
Posts: 78
Karma: 1332336
Join Date: Mar 2011
Location: montana
Device: none
Converting web site to epub

Websites dynamically generated from a database (WordPress or any other such system) can be made to spit out a series of HTML fragment files. One for each page. Each such HTML fragment does not have the <HTML><HEAD> or <BODY> elements, but usually do retain all other HTML markup in the resulting fragments.

For each such fragment file a hacker can use bash sed perl awk or python to do custom things to selected markups or perhaps to do tricky things like convert all occurrences of newlines to a space, but to leave all occurrences of two consecutive newlines in place.

At that point you have a text file that can be manually cut and pasted into sigil, or it can be copied into OEBPS/Text. If copied into OEBPS/Text a manual zip -r my.epub . can make a file that can be loaded into sigil.

Now you have transferred a website into a first draft of an ebook inside sigil. Once inside sigil there will still be a lot of work to do. But a LOT of the work has already been done, semi-automatically.

I know this can be done because I have just done it. But my work is clunky and in too many cases hard-coded and a bit error prone.

Are there any well-written utilities out there already for doing this? That might be more flexible and perhaps less buggy than my quick take?
pittendrigh is offline   Reply With Quote