Convert website with txt files to epub
I couldn't find anywhere how to convert a website that has some txt files to an epub so here is my method. For a plain html website all you need to do is to save the site to disk with say winHTTrack then drag the index file into calibre. But this only works if all the links are html files. It fails if some of the files are .txt and also I guess if pdf files.
So here is the method:
1. Grab the site using winHTTrack setting the options to store the file site in /web (flattening the structure).
2. Create a temp directory in the saved site site directory
3. Use this batch file (from site saved directory) to convert all the text files to html
---------------------------------------------------------------
for /R web %%i IN (*.txt) DO (
"C:\Program Files\Calibre2\ebook-convert.exe" web\%%~nxi temp
ren temp\index*.html %%~ni.html
)
---------------------------------------------------------------
4. Edit site index html file to change all the .txt links to .html and save to <saved-site>/temp directory.
5. Copy any other html files from <saved-site>/web to <saved-site>/temp
4. Drag index file from <saved-site>/temp into calibre
How this works is that ebook-convert.exe when given a directory (i.e. <saved-site>/temp) it dumps the intermediate html output from a .txt conversion and stops. Hence the batch file first converts each *.txt into html. The output is normally index.html but sometimes index1.html. The next line in the batch file renames the html output to the same name as the text file but with html extension. Hence when the batch file finishes calibre has done a default conversion of all the txt files to html files of the same name and stored in temp directory. It's then just a case of copy the other files and editing the site index file to point to .html rather than .txt, then dragging the index file into calibre.
|