MobileRead Forums - View Single Post - ABBYY Finereader & Epson Scanner Problems

DSpider · 01-10-2012, 09:58 AM

FineReader 6 is very old software. FineReader 10 came out in 2009, version 11 is the latest right now. I just want you to know that "one button" solutions and exporting straight to PDF right from the scanner is a bad idea. Instead, here's what I'd suggest using Scan Tailor, FineReader 11 and Adobe Acrobat X (even though they'll initially take up a lot of space):

scan as TIFF or PNG, 300 dpi, grayscale
run them through Scan Tailor, output mode set to Color/Grayscale + White margins + Equalize illumination (or see the bottom note if search accuracy isn't terribly important) and sort the pages by width and height to get the odd ones out, and match each other well. Try to get the chapter titles aligned too, so that they don't start from the very top (by default)
first change the settings in FineReader to use the original images instead of applying compression - because we'll be applying compression soon and compressing an ALREADY compressed image would make the artefacts from the first compression pop even more (not to mention if the original scans were JPGs instead of TIFF... then FineReader would apply compression, Acrobat too on top of that... it'd be like a triple kick in the groin)
drag the Scan Tailor-processed images in FineReader 11 and press the Read button (by default it does this automatically)
export as PDF
change the Adobe Acrobat X settings to compress the images to your liking (here you need to have some general knowledge how image compression works) and save as Reduced PDF, then Optimized PDF

This is the quick and dirty method. The quality method would be to proofread in FineReader, export as .docx (as "Formatted Text"), track down the fonts, spend time vectorizing the covers and any other graphics (figures, graphs, charts, cartoon-ish drawings, etc), do the layout in Word 2010, proofread the final product. This takes a significant more amount of time but the output quality is usually of very high quality. It's a pleasure to read such a book.

Note: Using the "Black and White" output method from Scan Tailor will produce much smaller files at the possible expense of OCR accuracy (FineReader already has its own filtering method, meaning that too much post-processing could interfere with the recognition process). Depending on how much you want or need the document to be search-able you could very well go with Black and White. I'd suggest using the settings from #2 and proofread the document in FineReader to have the best of both worlds (cleaner fonts and accurate searches).

01-10-2012, 09:58 AM	#2
DSpider Evangelist Posts: 450 Karma: 343115 Join Date: Nov 2009 Location: Romania Device: PW2 2014	FineReader 6 is very old software. FineReader 10 came out in 2009, version 11 is the latest right now. I just want you to know that "one button" solutions and exporting straight to PDF right from the scanner is a bad idea. Instead, here's what I'd suggest using Scan Tailor, FineReader 11 and Adobe Acrobat X (even though they'll initially take up a lot of space): scan as TIFF or PNG, 300 dpi, grayscale run them through Scan Tailor, output mode set to Color/Grayscale + White margins + Equalize illumination (or see the bottom note if search accuracy isn't terribly important) and sort the pages by width and height to get the odd ones out, and match each other well. Try to get the chapter titles aligned too, so that they don't start from the very top (by default) first change the settings in FineReader to use the original images instead of applying compression - because we'll be applying compression soon and compressing an ALREADY compressed image would make the artefacts from the first compression pop even more (not to mention if the original scans were JPGs instead of TIFF... then FineReader would apply compression, Acrobat too on top of that... it'd be like a triple kick in the groin) drag the Scan Tailor-processed images in FineReader 11 and press the Read button (by default it does this automatically) export as PDF change the Adobe Acrobat X settings to compress the images to your liking (here you need to have some general knowledge how image compression works) and save as Reduced PDF, then Optimized PDF This is the quick and dirty method. The quality method would be to proofread in FineReader, export as .docx (as "Formatted Text"), track down the fonts, spend time vectorizing the covers and any other graphics (figures, graphs, charts, cartoon-ish drawings, etc), do the layout in Word 2010, proofread the final product. This takes a significant more amount of time but the output quality is usually of very high quality. It's a pleasure to read such a book. Note: Using the "Black and White" output method from Scan Tailor will produce much smaller files at the possible expense of OCR accuracy (FineReader already has its own filtering method, meaning that too much post-processing could interfere with the recognition process). Depending on how much you want or need the document to be search-able you could very well go with Black and White. I'd suggest using the settings from #2 and proofread the document in FineReader to have the best of both worlds (cleaner fonts and accurate searches). Last edited by DSpider; 01-10-2012 at 10:06 AM.