View Single Post
Old 10-21-2007, 06:05 PM   #95
user
Connoisseur
user began at the beginning.
 
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
Font size or Font quality or both?

Note:
document = the page of the book with the printed text
image = the capture of the document, either with a scanner (scan), either with a camera (photo)

Font size on the image:
The font size in the image depends on the font size in the document and the resolution that we use to capture the image.

Font size on the document:
It can be measured in points
1 point = 0.3527 mm
Usualy fonts are 5-22p in printed text pages

As bigger the font is, the lower resolution is needed to produce the same font size in the captured image (I suppose we can make a graph or the ratio for "image resolution with which the page is captured" (x) and "font size in the printed page" (y)).
For example (I dont know the exact amounts), if a 20p font needs to be captured in 5MP resolution in order to have the optimum for OCR size in the image, then a 10p font needs to be captured in 10MP resolution in order to have the optimum for OCR size in the image.

So the optimum resolution varies and depends on the font size in the document. But why should we adjust resolution? Because we need the minimum resolution that gives the best OCR rate, because cameras with many megapixels have alot of noise and this is affecting the quality of the image (shooting in bigger resolution than needed, not only is overkill, not only is a waste of money for expensive camera, but it's also known that when MP increase, IQ decrease... etc+).

So we should shoot or scan to specific resolution according to the font size in order to achieve
however maybe there are not much differences between different shooting/scanning resolutions for normal and usual (10-20p) fonts. Maybe there is no advantage. Maybe the different resolutions havent so big difference (eg 0.0x megapixels). Also, maybe Finereader "normalizes" resolutions according to font size. These are just some thoughts to increase OCR accuracy and to find the ultimate camera for the job.

I wonder if Finereader needs more megapixels because it only needs big fonts, or it needs more pixels and better pixel quality as well, or all of these.

If Finereader needs more megapixels just for making fonts bigger, we should examine magnification.
One method is shooting at high resolution, so if we increase the resolution the image is magnified (shooting in 10MP and then in 12MP, the image size is increased by 10%).

Another solution is that if we have a low resolution captured image, we can magnify it with an image editing program in order to make the fonts in the image bigger.
So would only magnification be sufficient for achieving optimum font size in the image?
No, because image editing software magnification results in distortion of the image (and thus fonts), because it doesnt add pixels, it just magnify the current pixels:

So we need software that will trim megapixels and produce accurate lines and circles, and the trimming will be in proportion with the given character line of the specific font type and size.
One method to achive this is interpolation. Any photoshop guru may help us for magnify-without-distorting methods, and any other software or interpolation methods to do this.

Another thing to consider is that fonts are usualy of specific type, eg arial, verdana, etc.
I dont know if modern books mention somewhere which font are typed in.
I also dont know if there is a way (a program, or some rules) to identify by ourselfs the font type that a book is typed in.

But if we know the font type and the font size, I suppose it would be easier to tell the OCR program what to compare with (since it should have the whole character set of the specigic font type in the specific size).

I suppose and hope Finereader already does automatically all of the above (identifies font type, font size, etc and compares to the given font type all the characters line, interpolates if necessary, adjusts contrast, brightness, etc)
But, Abby's support is fast, though abysmal and not technical at all, afaik, and Finereader doesn't come with more technical requirements in order to achieve fine images and hit high OCR rates. So I would really like to know if Finereader doesnt do any of the above mentioned solutions so that we can help it more.

Last edited by user; 10-21-2007 at 07:18 PM.
user is offline   Reply With Quote