View Single Post
Old 01-29-2008, 02:05 AM   #30
snookums
Connoisseur
snookums doesn't littersnookums doesn't litter
 
Posts: 81
Karma: 100
Join Date: Jan 2008
Device: Kindle
I hear a lot of people here saying that OCR isn't that good. I've found that OCR can be brilliant if you know what you are doing. I feel that OCR gets a bad rep because people don't realize the real magic is in the scanning.

Tip: Scan in RAW format. When you normally scan the data from the scanner is processed with your settings and excess data is discarded. RAW saves all of the data that the scanner gathered. Afterwards you can change settings and see what the result would have been if you had scanned with them. This is especially useful for the first few images where you are trying to find the ideal color balance.

Tip: Scan in Black and White and find the ideal color balance before starting. The color balance is very important. You don't want too much contrast from your scan because that will bring out speckles in the paper that will throw off the OCR software. This is counter-intuitive because you probably wanting to jack up the resolution and contrast to catch all of the detail in the book. Don't. Scan at 300 dpi and set the color or white balance so that you are only getting the text and not the texture of the page.

Tip: Make it straight. OCR software is built to handle horizontal lines of text. If there more than a moderate slant in the way that you were holding the page over the scanner, it will spit out garbled text. Some of the more expensive OCR softwares offer the ability to rotate text, but it's best just to hold the paper straight as possible when you are scanning. That can be harder than you think you are scanning a bound book.
snookums is offline   Reply With Quote