Quote:
Originally Posted by Karellen
That's great. Thank you @Tex2002ans
I've installed and a quick trial run on an image I was previously having poor results in, and it OCR'd almost perfectly. In the few minutes I fiddled around with it, it seemed pretty easy to use. But I'll spend some time understanding it better.
I just learnt that images OCR better when using a non-compressed / lossless format.
|
Yes, I always use PNG or TIFF with a flatbed scanner. Never jpeg. Though some phones and cameras can "save" in png or TIF, many actually use jpeg as an intermediate format so you may need to set quality to 95 and this is why sometimes an elderly scanner with apparently lower resolution can give better results, apart from the issue of skew and lighting. This also why if the book is not valuable the spine may be cut off to at least allow flatter pages and possibly a duplex sheet feeder. Only do that with a cheap in-print title.
More modern dedicated scanners based on cameras have built in lighting, lasers etc to ensure de-skewing and even contrast. Better value for A3 and needed for books you can't cut up.
The png is typically one image per page. The Tiff format and a motion png format equivalent to gif can have an entire book in one file. Both do lossless compression and will compress white space or sold black completely, so good illumination is important.