View Single Post
Old 10-29-2008, 11:45 AM   #22
athlonkmf
Guru
athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.
 
Posts: 714
Karma: 1014039
Join Date: May 2007
Device: Sony PRS-500, Sony PRS-505, Kindle 3, Sony PRS350, iPad 64GB
Quote:
Originally Posted by TallMomof2 View Post
A scanned page image is essentially a photograph or picture of the page. Like a picture it is not seen as text (characters) by the ebook program. What you have to do is run the scanned pages through an OCR program to convert the images to text so that it is treated as text instead of an image. The "gotcha" is that conversion usually results in many errors that require a human to edit the text. I can't tell you how many ebooks I've read that are poorly converted scanned pages. And these are from legitimate publishers.

Quote:
Originally Posted by Dave Berk View Post
Google should turn to a community based collaborative approach. Where anyone who contribute over a certain quota get time-limited access to the whole archive.
Quote:
Originally Posted by Charbax View Post

Google could also use this same collaborative manual corrections system for translations.

When millions of users get to participate in an automatic collaborative way, you can quickly get the full OCR and translations done.


That's why we've got a project like recaptcha

http://recaptcha.net/learnmore.html
athlonkmf is offline   Reply With Quote