Quote:
Originally Posted by Ralph Sir Edward
Igorsky, how does the Google Epub OCR conversion stack up to Finereader's?
Would it be easier to clean up Google's version or Finereader's?
|
I have this notion in my head...
What about taking a given document, OCR-ing it with at least 3 or more different OCR programs, and then parallel parsing them character by character (perhaps now and then making and adjustment, if one of the streams is out of line do to an erroneously detect additional character) and always putting the character into the output stream that the (most) OCR-d texts agree on.
Obviously this won't help with anything that the various OCR programs get wrong in the same way... but it might minimize the amount of clean-up to be done thereafter.
How realistic is such an approach? Anybody here tried it before?
- Ahi