View Single Post
Old 06-11-2010, 04:04 PM   #3
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 493
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
To go into a bit more detail...

Basically, when an electronic file of letters is made, there is usually a tag that goes with the file so that other programs can open up the file and display the characters correctly. For some reason, there are different ways of mapping the characters in those files. For most characters, this mapping doesn't matter, ie, an "a" or a "t" can me mapped in many encodings, but still will show up properly if opened by a program using a different decoding method.

The problem usually comes in punctuation. You know those "smart" quotes that curl nicely before and after a quotation? Originally, computers where never designed to show those properly, so all quotes ended up straight like this: ". Then, later, word processors mapped out a smart quote to display nicely, Unfortunately, when some other programs open up a file with smart quotes, they don't or can't understand the mapping/encoding for the smart quote and guess as to how to display that punctuation. The result is often garbled and looks horrible.

Calibre has an algorithm that tries to guess and fit the best encoding for each file converted, but the problem is that many files are mislabeled by the program that made the file, or actually use a mix of different font character encodings within the same file due to very poor publishing practises. Still, in the end, this simply leads to reader frustration.

In dwanthny post above, s/he links to the calibre documentation that suggests you put in the type of encoding your original file is in so that calibre doesn't have to guess which encoding is being used. The best way of doing this is to try some of the following on one of your files being converted:

cp1252
cp1251
latin1
utf-8
ascii
slantybard is offline   Reply With Quote