View Single Post
Old 01-14-2012, 05:01 AM   #1
kbaerwald
BioReader
kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'
 
kbaerwald's Avatar
 
Posts: 292
Karma: 42568
Join Date: Apr 2009
Location: Germany
Device: Various
Scanning Paperbacks with Yellow Background

I want to scan 250-300 old paperbacks from the sixties (all paper with partly strong yellowish cast, characters mostly 8-9 point large, cheap print quality). Most books have 150-400 pages.

The process will be destructive. Cutting off the spine and sending the individual sheets through a Canon P-150 (double-side scanning). The resulting muli-page tifs will be processes further by Fineprint 11 OCR: targets are tagged and searchable pdf files.

I could do three different pre-processing steps
  1. pre-processing the tif files with photographic means to filter the yellowish (really bad dark yellow) out and feeding the tifs into a black-and-white workflow
  2. scanning the grayscale 600 dpi tif directly into Fineprint 11 and have Fineprint doing all the optimization
  3. scanning the grayscale 600 dpi tif and move the files into Scan tailor first reducing to b&w and feeding into Fineprint 11

Given the number of books to process would you recommend the one or other process (or a completely different one)?
It should be done highly automated with a fairly good recogniten rate for OCR at the same time . File space is not really a problem.

In which steps should I do the optimization of contrast and brightness or should I give Fineprint control over it?

I have done some testing already with mixed results. Any recommendation is appreciated.

Klaus
kbaerwald is offline   Reply With Quote