An important question here is: Why are the publishers OCRing their books? Except for the rare artisan presses, NONE of them are using letterpress anymore. All of their books were typeset and printed from electronic files in the first place. I'm not expecting them to be able to keep formatting, but the text and content should at least be the same quality as what was printed. This is not even counting basic consumer layout programs like Quark and InDesign which encourage you to keep the text in a separate file that is linked into the layout file.
|