View Single Post
Old 03-22-2013, 10:23 PM   #1
sumguy
Connoisseur
sumguy can extract oil from cheesesumguy can extract oil from cheesesumguy can extract oil from cheesesumguy can extract oil from cheesesumguy can extract oil from cheesesumguy can extract oil from cheesesumguy can extract oil from cheesesumguy can extract oil from cheesesumguy can extract oil from cheese
 
Posts: 57
Karma: 1186
Join Date: Jun 2012
Device: none
books with one html file for each page - how to convert?

Hi all -

I have some books in a very annoying format, one html file for each page. At the top and bottom of every page are navigation links to next/previous page. I'm trying to figure out how to convert them with decent formatting to read on my kindle, or at least maybe as a pdf. I was thinking about using Calibre's search and replace to get rid of the navigation links and tables. But first I'm just trying to get the pages all together...

At first I tried to merge all the html together, and spent a long time trying to fix it up in a text editor, to get rid of all the page breaks and have a better flow. But it was pretty tedious, it's all marked up in tables and so on, it took forever, and the footnotes didn't go so well... Anyway I suppose I wouldn't really mind keeping the page structure, because there's an index at the end that refers to the correct pages.

I read in the manual that I could make a table of contents html file, with a link to every page, and add that to Calibre. That seemed to work, but then I end up with this huge list at the beginning of the book, dozens of pages with hundreds of links, one to every single page in the book. Is there some way to avoid that, or to edit that afterwards so there are only links to the relevant chapter beginnings?

Then I read somewhere else that I could just add the first file, cover.html (which links to the next page, which links to the next, and so on) and Calibre will follow all the links, and the links inside the pages it links to, etc. - but is there some limit to the depth of this? When I tried, only the first eight pages or so showed up inside the zip file it created, and in the .mobi I made, which had a table of contents with only two entries, "next page" and "previous page". The next page after page_ix.html is page_1.html; the link is there and it works in my browser, but page_1 and all subsequent pages are missing.

I also tried renaming cover.html to index.html (and editing the navigation link in the second page) and making a zip of the folder, and adding that. So then obviously all the pages were in the zip file, but when I tried to make an epub out of it, again only the first few pages were there.

Well, aside from that... if anyone has any suggestions of a better way to go about this, I'd appreciate it!
Attached Files
File Type: txt calibre-log.txt (6.3 KB, 391 views)
sumguy is offline   Reply With Quote