MobileRead Forums - View Single Post - Scanning Paperbacks with Yellow Background

kbaerwald · 01-14-2012, 05:01 AM

I want to scan 250-300 old paperbacks from the sixties (all paper with partly strong yellowish cast, characters mostly 8-9 point large, cheap print quality). Most books have 150-400 pages.

The process will be destructive. Cutting off the spine and sending the individual sheets through a Canon P-150 (double-side scanning). The resulting muli-page tifs will be processes further by Fineprint 11 OCR: targets are tagged and searchable pdf files.

I could do three different pre-processing steps

pre-processing the tif files with photographic means to filter the yellowish (really bad dark yellow) out and feeding the tifs into a black-and-white workflow
scanning the grayscale 600 dpi tif directly into Fineprint 11 and have Fineprint doing all the optimization
scanning the grayscale 600 dpi tif and move the files into Scan tailor first reducing to b&w and feeding into Fineprint 11

Given the number of books to process would you recommend the one or other process (or a completely different one)?
It should be done highly automated with a fairly good recogniten rate for OCR at the same time

. File space is not really a problem.

In which steps should I do the optimization of contrast and brightness or should I give Fineprint control over it?

I have done some testing already with mixed results. Any recommendation is appreciated.

Klaus

01-14-2012, 05:01 AM	#1
kbaerwald BioReader Posts: 292 Karma: 42568 Join Date: Apr 2009 Location: Germany Device: Various	Scanning Paperbacks with Yellow Background I want to scan 250-300 old paperbacks from the sixties (all paper with partly strong yellowish cast, characters mostly 8-9 point large, cheap print quality). Most books have 150-400 pages. The process will be destructive. Cutting off the spine and sending the individual sheets through a Canon P-150 (double-side scanning). The resulting muli-page tifs will be processes further by Fineprint 11 OCR: targets are tagged and searchable pdf files. I could do three different pre-processing steps pre-processing the tif files with photographic means to filter the yellowish (really bad dark yellow) out and feeding the tifs into a black-and-white workflow scanning the grayscale 600 dpi tif directly into Fineprint 11 and have Fineprint doing all the optimization scanning the grayscale 600 dpi tif and move the files into Scan tailor first reducing to b&w and feeding into Fineprint 11 Given the number of books to process would you recommend the one or other process (or a completely different one)? It should be done highly automated with a fairly good recogniten rate for OCR at the same time . File space is not really a problem. In which steps should I do the optimization of contrast and brightness or should I give Fineprint control over it? I have done some testing already with mixed results. Any recommendation is appreciated. Klaus