MobileRead Forums - View Single Post

nrapallo · 01-06-2009, 11:18 PM

Quote:

Originally Posted by llasram

Actually, I take it back. The Mobiperl 'mobi2html' errors with the Mobipocket book I've generated appear to be errors with Mobiperl's handling of UTF-8 encoded books. With UTF-8 encoding, each text record is followed by 0 or more "overlapping" bytes finishing the current multibyte character, plus an 8-bit integer count of the overlapping bytes as an additional byte. These additional bytes are not counted as part of the content length for the purposes of computing the "filepos" of link targets.

*Thank you* for finally confirming my suspision that the byte count to the filepos/link is "off" in mobi2html (and consequently in Mobi2IMP). I've had to sometimes add upto 200 extra bytes to find the "anchor" tag the filepos was referring to in my conversions from .prc to .imp. I had no idea why I had to do this and never would have thought the UTF-8 decoding could have precipitated this, but it does make awful good sense to me now that you mentioned this!

My Mobi2IMP solution (which was a brute force naive approach) was to scan forward in the uncompressed text (html) from the stated filepos position and look for the first '<' to plop the anchor (for that filepos)! 99% of the times it worked, but it was not elegant nor foolproof!