Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 08-20-2012, 09:38 AM   #1
Claghorn
Member
Claghorn began at the beginning.
 
Claghorn's Avatar
 
Posts: 16
Karma: 10
Join Date: Aug 2012
Device: Nexus 7
What character encoding am I seeing?

I'm trying to convert a kindle book, and I'm looking at the unpacked mobi html and have no idea what character encoding I'm seeing. The html claims to be utf8, but that is clearly a lie. For instance, I see a Ctrl-Y (0x19) in places that clearly should be rendered as an apostrophe. Other low numbered control chars ^S, ^[, ^], etc are also apparently used for some kind of characters (I think the brackets are left and right double quote).

Anyone recognize this from kindle books they have converted? Any tools to turn it into legit utf8 or html special characters?

I suppose I can fix in manually by finding all the funny chars and seeing how the text is actually rendered on my kindle, but I was hoping someone might have encountered this before.
Claghorn is offline   Reply With Quote
Old 08-22-2012, 10:02 AM   #2
Claghorn
Member
Claghorn began at the beginning.
 
Claghorn's Avatar
 
Posts: 16
Karma: 10
Join Date: Aug 2012
Device: Nexus 7
Finally figured this out, it was a side effect of incorrectly using the -raw option wrong in the "tidy" tool to indent my html. It apparently changed a unicode U+2019 into just 19, etc. (I thought -raw meant "leave the dadgum characters alone, but apparently not :-).
Claghorn is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with font or character encoding no harmony Calibre 2 11-25-2011 09:50 AM
Character encoding, hex, emdash, and the meaning of life. Starson17 Conversion 8 08-18-2011 04:25 PM
how to tell the character encoding??? rheostaticsfan Calibre 23 06-21-2010 03:26 PM
Character encoding in the filesystem Jellby Bookeen 1 03-30-2008 05:36 AM
FBReader fixes character encoding problem jbenny News 1 10-18-2007 10:50 PM


All times are GMT -4. The time now is 05:38 PM.


MobileRead.com is a privately owned, operated and funded community.