View Single Post
Old 10-07-2009, 12:44 PM   #56
Steven Lyle Jordan
Grand Sorcerer
Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.
 
Steven Lyle Jordan's Avatar
 
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
I'd like to share a tip that has improved my scan output quality, and minimized errors in final text, and it's not as bad as it sounds: Add a scan step to your process.

Specifically, use a good photocopier to create letter/A4-sized pages of your books. If your book page is smaller than letter/A4, set the copier to enlarge the copy to fit the page. That way, you get larger letters, clearer spaces and punctuation, making the OCR process easier. You can also take advantage of any copier image controls to improve text/background contrast on the pages, further improving character legibility.

The advantage of this is that you can then feed those letter/A4 sheets through a high-quality professional scanner... they are optimized for letter/A4 page processing, and most will give you 300-600DPI TIF image files.

I've done this in the past, typically taking 10-30 minutes to copy the pages of an average book, depending on the copier type. The rest of the process takes about as long, but if your scanner has an automatic feeder, it can scan 50-100 pages a minute, and save you even more time in the scan process. Not to mention generating fewer errors in OCR.

FYI: Sorry, I don't have the access to copiers and scanners that I used to, so I can't recommend brands...

Last edited by Steven Lyle Jordan; 10-07-2009 at 01:14 PM. Reason: Said "JPG" when I meant to say "TIF". Sorry!
Steven Lyle Jordan is offline   Reply With Quote