Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 08-18-2012, 04:18 PM   #1
Claghorn
Member
Claghorn began at the beginning.
 
Claghorn's Avatar
 
Posts: 10
Karma: 10
Join Date: Aug 2012
Device: Nexus 7
Where to look for conversion problems?

I'm trying to learn how to perfect the epub files I generate, and I started with the largest and most complex ebook I own to maximize my problems :-).

This is a kindle edition of the 4 book bundle of the first 4 volumes of Game of Thrones.

It is an "old" format mobi file, not the new K8 version, but I can unpack it with the mobi unpack plugin, and run the massive html file through tidy -xml -indent to make it easier to read.

As near as I can tell, the .ncx file is a perfectly correct table of contents, and in the html itself, there is a mbp pagebreak tag before every chapter and an anchor right after the pagebreak which the TOC points at. In other words, all the structure seems to be correctly defined.

But when I run it through a MOBI to EPUB conversion, I get lots of broken TOC entries and missing page breaks. All the front sections with the cover, title page, copyright, etc. are jammed together with no page breaks.

The biggest problem though are the chapters. In the book, each chapter starts with a little decorative image. The html anchors all come after the pagebreak and before the image, yet sometimes (about 20% of the chapters) the resulting epub file has the image at the end of the previous page and the text of the chapter starts on the next page. It is always the same chapters that do this, yet I can't see anything in the html that make those chapters different.

These are all massive files and also copyrighted, so there isn't really anything I can post anywhere, so I'm just wondering if anyone has any advice about how to proceed?

Is there some option I'm not noticing in the conversion that says "Trust the .ncx file"?

Some other option that says "Always split the epub at the mbp pagebreak tags?"
Claghorn is offline   Reply With Quote
Old 08-18-2012, 08:57 PM   #2
Claghorn
Member
Claghorn began at the beginning.
 
Claghorn's Avatar
 
Posts: 10
Karma: 10
Join Date: Aug 2012
Device: Nexus 7
I've been fooling with editing the input stage of the debug output, and wrote a perl script to modify the html files to put header markup with class="chapter" around the anchors the toc.ncx file points at, then I do the conversion matching h1 through h3 headers with class=chapter and tell it to force replace the TOC. This works nearly flawlessly, and leaves me even more confused about why the manual says the convert will use the existing TOC if the file has one. All I've done is move the existing TOC into the html, so it should be the same.

Anyway, this does seem to solve my problems, and the few errors left appear to be actual bugs in the original doc which I can fix by hand.

Next I need to finish my script to convert all the opaque grayscale .jpeg files into .png files with transparency so they'll look natural when you change the background color.
Claghorn is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with Conversion jostegall Conversion 4 07-31-2011 10:32 PM
Conversion Problems walters291 Conversion 5 07-06-2011 01:50 AM
Conversion problems drftr Calibre 3 11-30-2010 04:51 PM
Problems with conversion CrazyTosser Calibre 0 10-25-2010 10:58 AM
Conversion problems DrZoidberg Calibre 4 02-13-2010 12:52 PM


All times are GMT -4. The time now is 11:55 AM.


MobileRead.com is a privately owned, operated and funded community.