MobileRead Forums - View Single Post

Claghorn · 08-18-2012, 05:18 PM

I'm trying to learn how to perfect the epub files I generate, and I started with the largest and most complex ebook I own to maximize my problems :-).

This is a kindle edition of the 4 book bundle of the first 4 volumes of Game of Thrones.

It is an "old" format mobi file, not the new K8 version, but I can unpack it with the mobi unpack plugin, and run the massive html file through tidy -xml -indent to make it easier to read.

As near as I can tell, the .ncx file is a perfectly correct table of contents, and in the html itself, there is a mbp pagebreak tag before every chapter and an anchor right after the pagebreak which the TOC points at. In other words, all the structure seems to be correctly defined.

But when I run it through a MOBI to EPUB conversion, I get lots of broken TOC entries and missing page breaks. All the front sections with the cover, title page, copyright, etc. are jammed together with no page breaks.

The biggest problem though are the chapters. In the book, each chapter starts with a little decorative image. The html anchors all come after the pagebreak and before the image, yet sometimes (about 20% of the chapters) the resulting epub file has the image at the end of the previous page and the text of the chapter starts on the next page. It is always the same chapters that do this, yet I can't see anything in the html that make those chapters different.

These are all massive files and also copyrighted, so there isn't really anything I can post anywhere, so I'm just wondering if anyone has any advice about how to proceed?

Is there some option I'm not noticing in the conversion that says "Trust the .ncx file"?

Some other option that says "Always split the epub at the mbp pagebreak tags?"

08-18-2012, 05:18 PM	#1
Claghorn Member Posts: 16 Karma: 10 Join Date: Aug 2012 Device: Nexus 7	Where to look for conversion problems? I'm trying to learn how to perfect the epub files I generate, and I started with the largest and most complex ebook I own to maximize my problems :-). This is a kindle edition of the 4 book bundle of the first 4 volumes of Game of Thrones. It is an "old" format mobi file, not the new K8 version, but I can unpack it with the mobi unpack plugin, and run the massive html file through tidy -xml -indent to make it easier to read. As near as I can tell, the .ncx file is a perfectly correct table of contents, and in the html itself, there is a mbp pagebreak tag before every chapter and an anchor right after the pagebreak which the TOC points at. In other words, all the structure seems to be correctly defined. But when I run it through a MOBI to EPUB conversion, I get lots of broken TOC entries and missing page breaks. All the front sections with the cover, title page, copyright, etc. are jammed together with no page breaks. The biggest problem though are the chapters. In the book, each chapter starts with a little decorative image. The html anchors all come after the pagebreak and before the image, yet sometimes (about 20% of the chapters) the resulting epub file has the image at the end of the previous page and the text of the chapter starts on the next page. It is always the same chapters that do this, yet I can't see anything in the html that make those chapters different. These are all massive files and also copyrighted, so there isn't really anything I can post anywhere, so I'm just wondering if anyone has any advice about how to proceed? Is there some option I'm not noticing in the conversion that says "Trust the .ncx file"? Some other option that says "Always split the epub at the mbp pagebreak tags?"