View Single Post
Old 10-29-2008, 10:40 AM   #20
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
Quote:
Originally Posted by TallMomof2 View Post
A scanned page image is essentially a photograph or picture of the page. Like a picture it is not seen as text (characters) by the ebook program. What you have to do is run the scanned pages through an OCR program to convert the images to text so that it is treated as text instead of an image. The "gotcha" is that conversion usually results in many errors that require a human to edit the text. I can't tell you how many ebooks I've read that are poorly converted scanned pages. And these are from legitimate publishers.
Precisely. No OCR program is perfect. Ligatures are a special problem, and multi-column formats can throw the OCR software included with things like home scanners. Higher end professional gear does better, but it costs, and there will still be editing and proofreading to get good copy.

The publishers whose lacking work you read skimped on or eliminated the editing step to cut costs.

(And that's just for texts in the Roman alphabet. If the original book was in something else, all bets are off.)
______
Dennis
DMcCunney is offline   Reply With Quote