View Single Post
Old 10-21-2007, 05:45 PM   #94
user
Connoisseur
user began at the beginning.
 
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
Question:
I wonder now if we really need so many megapixels or its useless or overkill. And if not, what is the most importan factor to achieve highest OCR rates.

Scanner:
The above is a scan. I dont know how this scan was produced, what scanner was used, what scanner settings were used, I only know its dimensions (A4), its resolution (850x1103, 100dpi) and that its black-and-white.



Here is the scan that was OCRed:

Here is the OCRed text:

Finereader results:
Uncertain characters: 18
Total characters: 2087
OCR success rate: 99.14%

The Finereader results were superb in my opinion, considering that, as you can see, most of the uncertain characters were some music characters.

This results make me really wonder. In order to produce a 100dpi image of A4 size you only need 0.9MP. So there must be something more than just resolution. Abby cannot say "you just need a 300dpi image" in order to OCR successfuly.

Camera solution:
The Fujifilm F31fd has excellent sharpness and image quality and 6.3MP resolution. It will produce very sharp images of 3024x2016 resolution and as the article from this respected website says, "Fuji has managed, with (F31fd's) sensor and processor combination, closer than ever before to 'SLR-like' output from a compact camera, when its compared with the SLR Nikon D50 (6.24MP, 23.7 x 15.6 mm sensor, +1k USD)

Images of 6.3MP resolution in the A4 size produce 260dpi, which is below Finereader's minimum requirements, but as I said, I wonder if this dpi is really not enough.

Finereader has 300dpi as the only (afaik) requirement that the image needs to have in order to perform accurate OCR.
Obviously, Finereader can't "see"/OCR too big fonts or too small fonts in the image and there must be an optimum font size or, better, range that OCR success rate is highest. But why it doesnt have specific pixel size, brightness, contrast, etc in the minimum requirements for accurate OCR?

In my opinion, Finereader not only needs the font in the captured image to be of specific size, but of specific quality as well (sharp image with good contrast, smooth boundaries of the lines, etc). We need to know more parameters in order to achieve 100% OCR rate (99.2% OCR rate is not enough, because it means 600 mistakes that need to be manually corrected in a 300 pages book, thats looks like a lot of job)

Last edited by user; 10-21-2007 at 06:07 PM.
user is offline   Reply With Quote