books with one html file for each page - how to convert?

sumguy · 03-22-2013, 10:23 PM

Hi all -

I have some books in a very annoying format, one html file for each page. At the top and bottom of every page are navigation links to next/previous page. I'm trying to figure out how to convert them with decent formatting to read on my kindle, or at least maybe as a pdf. I was thinking about using Calibre's search and replace to get rid of the navigation links and tables. But first I'm just trying to get the pages all together...

At first I tried to merge all the html together, and spent a long time trying to fix it up in a text editor, to get rid of all the page breaks and have a better flow. But it was pretty tedious, it's all marked up in tables and so on, it took forever, and the footnotes didn't go so well... Anyway I suppose I wouldn't really mind keeping the page structure, because there's an index at the end that refers to the correct pages.

I read in the manual that I could make a table of contents html file, with a link to every page, and add that to Calibre. That seemed to work, but then I end up with this huge list at the beginning of the book, dozens of pages with hundreds of links, one to every single page in the book. Is there some way to avoid that, or to edit that afterwards so there are only links to the relevant chapter beginnings?

Then I read somewhere else that I could just add the first file, cover.html (which links to the next page, which links to the next, and so on) and Calibre will follow all the links, and the links inside the pages it links to, etc. - but is there some limit to the depth of this? When I tried, only the first eight pages or so showed up inside the zip file it created, and in the .mobi I made, which had a table of contents with only two entries, "next page" and "previous page". The next page after page_ix.html is page_1.html; the link is there and it works in my browser, but page_1 and all subsequent pages are missing.

I also tried renaming cover.html to index.html (and editing the navigation link in the second page) and making a zip of the folder, and adding that. So then obviously all the pages were in the zip file, but when I tried to make an epub out of it, again only the first few pages were there.

Well, aside from that... if anyone has any suggestions of a better way to go about this, I'd appreciate it!

Dopedangel · 03-23-2013, 07:28 AM

use something like this to merge the html files.
http://www.iterati.org/ebookTools/vHtmlMerger/

The maybe use sigil to clean up and make a clean epub

DoctorOhh · 03-23-2013, 08:00 AM

Follow the info in this section of the FAQ and you'll be fine.

theducks · 03-23-2013, 11:28 AM

Quote:

Originally Posted by DoctorOhh

Follow the info in this section of the FAQ and you'll be fine.

He did that, he got a book with a TOC for every page.

Idea: Convert the Book with Chapter detection on (assume real chapter markers exist), that should concatenate the individual pages.

DoctorOhh · 03-23-2013, 09:06 PM

Quote:

Originally Posted by theducks

He did that, he got a book with a TOC for every page.

If he did this correctly he wouldn't be asking. The link provides an example.

Quote:

Originally Posted by theducks

Idea: Convert the Book with Chapter detection on (assume real chapter markers exist), that should concatenate the individual pages.

That might work...

Idea: create the index correctly and try again.

sumguy · 03-24-2013, 01:17 AM

hi all, thanks for your replies...

Dopedangel, at first I used SoftSnow Merger to merge the html files, but I still had all the previous/next page navigation links etc., and it was really a pain to try to get rid of them and have the pages join up to each other, mainly because everything is inside a complicated structure of html tables. Anyway I'll give vHtmlMerger a try and see if it makes it any easier. I've never used Sigil, was kind of hoping to be able to do it in Calibre without having to learn a whole 'nother app...

DoctorOhh, theducks is right, I did follow the info in that section of the faq correctly. I created "another HTML file that contains links to all the other files in the desired order". Usually that's for when you have a book with each chapter in a separate html file. But in this case, the original book I have is a folder with 297 html files, one for every page. I created an html table of contents with a link to each file, just as in the example. So I ended up with this table of contents, with 297 entries in it, at the beginning of the book.

That's exactly what I expected to happen. It just isn't a very optimal solution, to have this 15-page table of contents with an entry for every page in it. I mean it does work. I was just wondering if there's a way I could then edit that table of contents html, after it had been imported into calibre. I tried unzipping the zip and editing it, and zipping it again, but then it was unhappy and wouldn't convert to mobi anymore... well, if you have a suggestion of some other way I should be making this index, I'm happy to hear it.

I did try making an html table of contents for only the pages of the beginnings of chapters. Calibre did bring in a bunch of other pages (not sure if it was all of them) but they were all mixed up, in the wrong order. I also tried the "breadth first" setting, didn't help.

theducks, I'm not sure how to go about trying your idea. I'm assuming you mean the chapter detection in the heuristic processing. But I'm still stuck at how to get all the html pages into Calibre in the first place, in the right order, without making this huge table of contents by hand...

Well, maybe I'll try some more experiments if I have time tomorrow, thanks again!

BetterRed · 03-24-2013, 01:37 AM

Quote:

Originally Posted by sumguy

....I was just wondering if there's a way I could then edit that table of contents html, after it had been imported into calibre.

Did you try Calibre's new TOC Edit tool ===>>> https://www.mobileread.com/forums/sho...d.php?t=208299

BR

DoctorOhh · 03-24-2013, 02:06 AM

Quote:

Originally Posted by sumguy

DoctorOhh, theducks is right, I did follow the info in that section of the faq correctly. I created "another HTML file that contains links to all the other files in the desired order". Usually that's for when you have a book with each chapter in a separate html file. But in this case, the original book I have is a folder with 297 html files, one for every page. I created an html table of contents with a link to each file, just as in the example. So I ended up with this table of contents, with 297 entries in it, at the beginning of the book.

I guess I was a little slow on the uptake. Try converting the book with 297 entries in the TOC to htmlz which should give you one large html file. Edit the file to remove the TOC and convert to epub or mobi as theducks suggested and if the chapters are clearly identified this may do the trick.

Quote:

Originally Posted by sumguy

theducks, I'm not sure how to go about trying your idea. I'm assuming you mean the chapter detection in the heuristic processing.

Yes use the heuristic processing.

03-24-2013, 01:17 AM	#6
sumguy Connoisseur Posts: 57 Karma: 1186 Join Date: Jun 2012 Device: none	hi all, thanks for your replies... Dopedangel, at first I used SoftSnow Merger to merge the html files, but I still had all the previous/next page navigation links etc., and it was really a pain to try to get rid of them and have the pages join up to each other, mainly because everything is inside a complicated structure of html tables. Anyway I'll give vHtmlMerger a try and see if it makes it any easier. I've never used Sigil, was kind of hoping to be able to do it in Calibre without having to learn a whole 'nother app... DoctorOhh, theducks is right, I did follow the info in that section of the faq correctly. I created "another HTML file that contains links to all the other files in the desired order". Usually that's for when you have a book with each chapter in a separate html file. But in this case, the original book I have is a folder with 297 html files, one for every page. I created an html table of contents with a link to each file, just as in the example. So I ended up with this table of contents, with 297 entries in it, at the beginning of the book. That's exactly what I expected to happen. It just isn't a very optimal solution, to have this 15-page table of contents with an entry for every page in it. I mean it does work. I was just wondering if there's a way I could then edit that table of contents html, after it had been imported into calibre. I tried unzipping the zip and editing it, and zipping it again, but then it was unhappy and wouldn't convert to mobi anymore... well, if you have a suggestion of some other way I should be making this index, I'm happy to hear it. I did try making an html table of contents for only the pages of the beginnings of chapters. Calibre did bring in a bunch of other pages (not sure if it was all of them) but they were all mixed up, in the wrong order. I also tried the "breadth first" setting, didn't help. theducks, I'm not sure how to go about trying your idea. I'm assuming you mean the chapter detection in the heuristic processing. But I'm still stuck at how to get all the html pages into Calibre in the first place, in the right order, without making this huge table of contents by hand... Well, maybe I'll try some more experiments if I have time tomorrow, thanks again! Last edited by sumguy; 03-24-2013 at 01:21 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Convert HTML to MOBI (HTML recognized as ZIP file)	pdubois	Conversion	1	01-25-2011 12:55 PM
can't convert prc file to html	kdd6	Calibre	5	12-21-2010 08:45 AM
How can i convert HTML or txt file to EPUB file ?	guguqiaqia	ePub	7	05-28-2010 09:15 PM
Convert HTML file to MOBI for Kindle	IMFletch	Calibre	5	04-16-2010 01:06 PM
Plucker: can't right-click on html file to convert...	jplowman	Reading and Management	1	08-08-2009 11:21 PM

03-23-2013, 07:28 AM	#2
Dopedangel Wizard Posts: 1,759 Karma: 30063305 Join Date: Dec 2006 Location: Singapore Device: Boyue	use something like this to merge the html files. http://www.iterati.org/ebookTools/vHtmlMerger/ The maybe use sigil to clean up and make a clean epub

03-23-2013, 08:00 AM	#3
DoctorOhh US Navy, Retired Posts: 9,864 Karma: 13806776 Join Date: Feb 2009 Location: North Carolina Device: Icarus Illumina XL HD, Nexus 7	Follow the info in this section of the FAQ and you'll be fine.