I want to scan 250-300 old paperbacks from the sixties (all paper with partly strong yellowish cast, characters mostly 8-9 point large, cheap print quality). Most books have 150-400 pages.
The process will be destructive. Cutting off the spine and sending the individual sheets through a Canon P-150 (double-side scanning). The resulting muli-page tifs will be processes further by Fineprint 11 OCR: targets are tagged and searchable pdf files.
I could do three different pre-processing steps
- pre-processing the tif files with photographic means to filter the yellowish (really bad dark yellow) out and feeding the tifs into a black-and-white workflow
- scanning the grayscale 600 dpi tif directly into Fineprint 11 and have Fineprint doing all the optimization
- scanning the grayscale 600 dpi tif and move the files into Scan tailor first reducing to b&w and feeding into Fineprint 11
Given the number of books to process would you recommend the one or other process (or a completely different one)?
It should be done highly automated with a fairly good recogniten rate for OCR at the same time
. File space is not really a problem.
In which steps should I do the optimization of contrast and brightness or should I give Fineprint control over it?
I have done some testing already with mixed results. Any recommendation is appreciated.
Klaus