08-20-2012, 09:38 AM | #1 |
Member
Posts: 16
Karma: 10
Join Date: Aug 2012
Device: Nexus 7
|
What character encoding am I seeing?
I'm trying to convert a kindle book, and I'm looking at the unpacked mobi html and have no idea what character encoding I'm seeing. The html claims to be utf8, but that is clearly a lie. For instance, I see a Ctrl-Y (0x19) in places that clearly should be rendered as an apostrophe. Other low numbered control chars ^S, ^[, ^], etc are also apparently used for some kind of characters (I think the brackets are left and right double quote).
Anyone recognize this from kindle books they have converted? Any tools to turn it into legit utf8 or html special characters? I suppose I can fix in manually by finding all the funny chars and seeing how the text is actually rendered on my kindle, but I was hoping someone might have encountered this before. |
08-22-2012, 10:02 AM | #2 |
Member
Posts: 16
Karma: 10
Join Date: Aug 2012
Device: Nexus 7
|
Finally figured this out, it was a side effect of incorrectly using the -raw option wrong in the "tidy" tool to indent my html. It apparently changed a unicode U+2019 into just 19, etc. (I thought -raw meant "leave the dadgum characters alone, but apparently not :-).
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem with font or character encoding | no harmony | Calibre | 2 | 11-25-2011 09:50 AM |
Character encoding, hex, emdash, and the meaning of life. | Starson17 | Conversion | 8 | 08-18-2011 04:25 PM |
how to tell the character encoding??? | rheostaticsfan | Calibre | 23 | 06-21-2010 03:26 PM |
Character encoding in the filesystem | Jellby | Bookeen | 1 | 03-30-2008 05:36 AM |
FBReader fixes character encoding problem | jbenny | News | 1 | 10-18-2007 10:50 PM |