Quote:
Originally Posted by Quoth
... and also the IA images need "cleaned" first.
|
Interesting that over the past few years my setup using Tesseract with OCRFeeder as a front end has become considerably better on old book images. Google has been developing it recently, and I understand a new AI/neural network bit has been added, but only for some detail, IIRR.
While old typesetting and generally poor images still cause many errors - especially punctuation - Tesseract can sometimes read words that I struggle to figure out. I rarely have to clean up AI images any more, unless they are really badly tilted, keystoned, or have something geometrically wrong. One recent book, Tesseract was doing fine, but I had to clean up the images so
I could read them for proofing!