View Single Post
Old 09-14-2009, 02:55 PM   #31
Greg Anos
Grand Sorcerer
Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.
 
Posts: 11,532
Karma: 37057604
Join Date: Jan 2008
Device: Pocketbook
Quote:
Originally Posted by ahi View Post
I have this notion in my head...

What about taking a given document, OCR-ing it with at least 3 or more different OCR programs, and then parallel parsing them character by character (perhaps now and then making and adjustment, if one of the streams is out of line do to an erroneously detect additional character) and always putting the character into the output stream that the (most) OCR-d texts agree on.

Obviously this won't help with anything that the various OCR programs get wrong in the same way... but it might minimize the amount of clean-up to be done thereafter.

How realistic is such an approach? Anybody here tried it before?

- Ahi

The idea is excellent, but I don't know of anybody who has written flexible parsing software. As a matter of fact, the idea could be used for any ocr'ed texts...

Big problem will be with differences in the embedded control sequences...
Greg Anos is offline   Reply With Quote