MobileRead Forums - View Single Post

calvin-c · 05-19-2010, 01:57 PM

Quote:

Originally Posted by HarryT

Conversion between "text" formats - ePub, Lit, Mobi, etc - should be "lossless" as far as the actual text is concerned. You may, however, lose formatting in the process.

At least partially this depends on your definition of text. I've seen many cases where conversion between formats has 'lost' characters it didn't recognize, usually punctuation or other symbols. (curly quotation marks & apostrophe's are particularly common, also foreign characters although those are less common.)

Another artifact I've often seen from conversions is 'added' text. Usually this is because the original used some sort of markup language that the conversion didn't recognize and, reasonably, anything it doesn't recognize it treats as text. The result is sometimes things like "Joe said <emp>&quotHold on!&quot</emp>" easily recognizable as 1) a non-HTML markup tag and 2) an erroneously typed HTML symbol code.

I'll note that neither of the above *should* be a problem in a conversion program, but they are. Whether or not it's a problem with the conversion program is something else that depends on your definition. It's been said that the best documentation for a program is the source code-that describes *exactly* what the program will handle and, by omission, what it won't.

In that case, using the program to convert files containing 'text' it won't handle is operator error rather than a problem with the program. That's not a practical definition, but I think it points out why there's a gray area when defining what constitutes a problem in a conversion program.

But, regardless of whether it's a problem in the conversion program or in how it's used, the fact remains that conversion programs can introduce errors that don't exist in the original document, and not just in formatting.