View Single Post
Old 01-06-2009, 11:18 PM   #24
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by llasram View Post
Actually, I take it back. The Mobiperl 'mobi2html' errors with the Mobipocket book I've generated appear to be errors with Mobiperl's handling of UTF-8 encoded books. With UTF-8 encoding, each text record is followed by 0 or more "overlapping" bytes finishing the current multibyte character, plus an 8-bit integer count of the overlapping bytes as an additional byte. These additional bytes are not counted as part of the content length for the purposes of computing the "filepos" of link targets.
*Thank you* for finally confirming my suspision that the byte count to the filepos/link is "off" in mobi2html (and consequently in Mobi2IMP). I've had to sometimes add upto 200 extra bytes to find the "anchor" tag the filepos was referring to in my conversions from .prc to .imp. I had no idea why I had to do this and never would have thought the UTF-8 decoding could have precipitated this, but it does make awful good sense to me now that you mentioned this!

My Mobi2IMP solution (which was a brute force naive approach) was to scan forward in the uncompressed text (html) from the stated filepos position and look for the first '<' to plop the anchor (for that filepos)! 99% of the times it worked, but it was not elegant nor foolproof!
nrapallo is offline   Reply With Quote