View Single Post
Old 09-14-2009, 02:45 PM   #30
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Ralph Sir Edward View Post
Igorsky, how does the Google Epub OCR conversion stack up to Finereader's?

Would it be easier to clean up Google's version or Finereader's?
I have this notion in my head...

What about taking a given document, OCR-ing it with at least 3 or more different OCR programs, and then parallel parsing them character by character (perhaps now and then making and adjustment, if one of the streams is out of line do to an erroneously detect additional character) and always putting the character into the output stream that the (most) OCR-d texts agree on.

Obviously this won't help with anything that the various OCR programs get wrong in the same way... but it might minimize the amount of clean-up to be done thereafter.

How realistic is such an approach? Anybody here tried it before?

- Ahi
ahi is offline   Reply With Quote