mobi conversion loses latin-extended-additional unicode characters
Greetings to all,
I've been going around and around trying to get Calibre to convert latin-extended-addtional characters from either an html import or a Sigil produced epub input to mobi output. It looks fine, of course, in Calibre's LRF reader, but when I open it up in the Kindle4pc all the "dot-under" and "dot-over" consonants are boxes. Interestingly, I can convert the same epub from Sigil with KindleGen, and the diacritics are fine. The rest of the formatting is monstrous, of course, which is why I'd really like to get over this hump in Calibre. I also don't think Calibre's truncating the unicode upon import, because I converted to .mobi in Calibre, then zipped up the debug output, changed the extension to .epub, and converted to mobi in KindleGen, and the diacritics rendered just fine. The formatting was also a better -- somewhere in between the pure KindleGen and straight Calibre mobi conversion.
I've also: 1) looked at the encoding in the html for "utf-8" declarations: good; 2) chosen the input encoding in "look & feel" to cp1252, utf-8, latin1 to see if that made a difference: none; 3) tried to set the input encoding to utf-8 from the command line: no change; 4) compared the html, toc.ncx, content.opf(sp?) in the Calibre mobi to the one passed through the KindleGen afterwards: no difference. 5) Modified the htmltozip plugin to specify utf-8, and then imported from html: no difference.
As a side bit of interest, I can embed fonts that have the latin-extended-additional characters into an epub with Sigil, and it works great on ADE. If try to embed them with Calibre, or epub to epub convert with Calibre, the same loss occurs, despite the fact that I can see the embedded font is showing up in the reader!
So very curious. I'd be grateful to know if anybody has any ideas what's going on. Best wishes.