MobileRead Forums - View Single Post - [Old Thread] New to Calibre--problems converting files

slantybard · 06-11-2010, 04:04 PM

To go into a bit more detail...

Basically, when an electronic file of letters is made, there is usually a tag that goes with the file so that other programs can open up the file and display the characters correctly. For some reason, there are different ways of mapping the characters in those files. For most characters, this mapping doesn't matter, ie, an "a" or a "t" can me mapped in many encodings, but still will show up properly if opened by a program using a different decoding method.

The problem usually comes in punctuation. You know those "smart" quotes that curl nicely before and after a quotation? Originally, computers where never designed to show those properly, so all quotes ended up straight like this: ". Then, later, word processors mapped out a smart quote to display nicely, Unfortunately, when some other programs open up a file with smart quotes, they don't or can't understand the mapping/encoding for the smart quote and guess as to how to display that punctuation. The result is often garbled and looks horrible.

Calibre has an algorithm that tries to guess and fit the best encoding for each file converted, but the problem is that many files are mislabeled by the program that made the file, or actually use a mix of different font character encodings within the same file due to very poor publishing practises. Still, in the end, this simply leads to reader frustration.

In dwanthny post above, s/he links to the calibre documentation that suggests you put in the type of encoding your original file is in so that calibre doesn't have to guess which encoding is being used. The best way of doing this is to try some of the following on one of your files being converted:

cp1252
cp1251
latin1
utf-8
ascii

06-11-2010, 04:04 PM	#3
slantybard my parent's oops... Posts: 493 Karma: 1477572 Join Date: Feb 2009 Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW	To go into a bit more detail... Basically, when an electronic file of letters is made, there is usually a tag that goes with the file so that other programs can open up the file and display the characters correctly. For some reason, there are different ways of mapping the characters in those files. For most characters, this mapping doesn't matter, ie, an "a" or a "t" can me mapped in many encodings, but still will show up properly if opened by a program using a different decoding method. The problem usually comes in punctuation. You know those "smart" quotes that curl nicely before and after a quotation? Originally, computers where never designed to show those properly, so all quotes ended up straight like this: ". Then, later, word processors mapped out a smart quote to display nicely, Unfortunately, when some other programs open up a file with smart quotes, they don't or can't understand the mapping/encoding for the smart quote and guess as to how to display that punctuation. The result is often garbled and looks horrible. Calibre has an algorithm that tries to guess and fit the best encoding for each file converted, but the problem is that many files are mislabeled by the program that made the file, or actually use a mix of different font character encodings within the same file due to very poor publishing practises. Still, in the end, this simply leads to reader frustration. In dwanthny post above, s/he links to the calibre documentation that suggests you put in the type of encoding your original file is in so that calibre doesn't have to guess which encoding is being used. The best way of doing this is to try some of the following on one of your files being converted: cp1252 cp1251 latin1 utf-8 ascii