Quote:
Originally Posted by TallMomof2
A scanned page image is essentially a photograph or picture of the page. Like a picture it is not seen as text (characters) by the ebook program. What you have to do is run the scanned pages through an OCR program to convert the images to text so that it is treated as text instead of an image. The "gotcha" is that conversion usually results in many errors that require a human to edit the text. I can't tell you how many ebooks I've read that are poorly converted scanned pages. And these are from legitimate publishers.
|
Quote:
Originally Posted by Dave Berk
Google should turn to a community based collaborative approach. Where anyone who contribute over a certain quota get time-limited access to the whole archive.
|
Quote:
Originally Posted by Charbax
Google could also use this same collaborative manual corrections system for translations.
When millions of users get to participate in an automatic collaborative way, you can quickly get the full OCR and translations done.
|
That's why we've got a project like recaptcha
http://recaptcha.net/learnmore.html