Scanning Basics in FineReader
-This is based upon Daddio's excellent FR 5 tutorial-
I am assuming here that you have a computer, a scanner, and FineReader 5, 6 or 7.0 OCR software. Naturally you can just follow along to see what is happening. There are slight differences between FineReader 5 to 7 (this guide is based on FR 7) but if used for earlier version (as a general guide) you should still get the idea.
My real purpose here is to let people know how easy this is to do so
we can encourage more "new" material.
The whole purpose of this tutorial is to show how to scan in an entire
book as quickly as possible. Speed will depend on the particular scanner and hardware you use. For the record, I use
a HP 4300c and a HP Scanjet 5400c, and sometimes my cheapie scanner (a Benq S2w
4300u). I'm about to show you how to set up FineReader to do it's part. The rest
is up to you and your scanner.

1# Turn on your scanner, start up FineReader and follow the simple examples. The above is a screenshot of how FineReader looks when you first start it up. I have the window shrunk down some for space and viewing considerations. In real life, you might want to expand the window to full screen size, especially when viewing the processed text.
|
|
|
2# From the Tools menu, select "Options...", (Keyboard Shortcut: Ctrl+Shift+O) and then select the "Scan/Open Image" tab. For our purposes here, it is absolutely critical that you select "Use FineReader interface." Next Click Scanner Settings.
Note: In Win 98 click the "Select Source" Tab and choose your default twain driver. In XP (home and Pro) make sure you choose your Scanners default Twain driver and not the Microsoft WAI twain driver. Only use the Microsoft driver if you are having problems with your scanners own twain driver. If you are having problems be sure to check your scanners web page for updated drivers (I had to update one of my older HP scanner drivers when going to XP).

3# After clicking the "Scanner Settings..." button; a box like the one above should appear. You can also get at this window from the Tools menu and selecting "Scanner Settings..." (Keyboard Shortcut: Ctrl+Shift+S) The above picture shows my basic set-up in the Scanner Settings Tab.
The settings above are fairly default, it is important however to remember to scan in Gray Pictures mode with a Resolution of 300 dpi. Naturally if you are scanning a book with colour pictures change the Gray Pictures Mode to Colour Mode. Click Okay when done.
Tip: Instead of scanning a whole book in colour (if there are some colour pictures in it) you can scan in Gray pictures mode (which is faster and takes up less space) and when you come across a colour picture go back to the "Scanner Settings" change to Colour Mode; scan the page; and then change it back to Gray Pictures Mode. Remember: this is only for pictures. If the text is in colour it does not need to be scanned in colour mode; when saving your scan it will automatically recognise coloured text (if you have set it up properly).

4# Go Back to the Options Box (Keyboard Shortcut: Ctrl+Shift+O) click the "Formatting" Tab. Make sure that the "Retain font and font size" option is checked, and the "keep pictures" option is checked. The rest you can keep as the default settings. When you are more used to the program experiment a bit with the settings. After you have finished click the "Format Settings" button.
5# The Format Settings Dialogue is the place where you tell FineReader how (and in what formats) you want to save your scanned books. This is where there will be a wide variety of differences in opinion, each person has their own favourite format for their own reasons. I personally use html and text, and PDF (for an exact backup copy of the book). Daddio (and others) prefer RTF I will try to explain each briefly; In another tutorial I will explain each format in a bit more depth.
Text Settings

Text: One very common complaint about e-book text files (for example files from Project Gutenberg) is that there is a linefeed (an annoying line break) at the end of each line instead of only after paragraphs where they belong. Let's make sure that doesn't happen with our text file. In the "Format Settings" window select the TXT tab.
Tick: "Use blank line as paragraph separator" and "Append to end of file", leave the other options alone. If you tick the "Keep line breaks" option you will have a linefeed (annoying line break) after each line and no indication of where the next paragraph starts. There will be no easy way to correct this later on. With things marked as they are in this window, you will end up with squared off paragraphs with a double linefeed between them. Later, if you wish, you can alter the paragraph style with Textify or with a search and replace in your favorite word processor.
Positive Points: This
format is universal, any computer with any Operating System (OS) -old or new-
can read it.
Negative Points: You can not include pictures,
coloured text and you lose italics, bold text and so on.
HTML Settings

HTML: In the "Format Settings" window select the HTML tab.
Tick: "Retain text color" and "Simple (compatible with all browsers)" these are your basic settings. It has been my experience that the "simple option" for html is the easiest to work with when it comes to editing or converting to other formats later on.
If you notice the above example has the "Reduce picture resolution" to 300 and the "JPEG quality" to 90 this is not the default setting and is far higher than what is recommended. I do this because I re-edit the pictures later on in a graphics program to optimize it and reduce its size (with as little loss of quality as possible). This setting is really up to you.
Positive Points: This format is universal,
it can retain special formatting (bold, italics coloured text etc) and pictures.
Negative Points: Depending on your html editor it can be tricky at
times to get it the way you want.
PDF Settings

PDF: In the "Format Settings" window select the PDF tab.
Tick: "Text under the page image". This is an important option to tick because what it does is copy the book layout exactly as it is and also allows it to be fully searchable because the top layer is the image and underneath is any text that FineReader has recognised.
The other options (under "Save Mode") result in smaller files but also do not look as good. In the "Reduce picture resolution to" and "JPEG quality" the above example shows 150 and 50 respectively, I use 300 and 90 instead which increases the size even more but also looks the best. I use the PDF option only as an exact "copy backup" of the book I am doing.
Tip: If you save your PDF file with the settings of 300 / 90 you can use FineReader to open the PDF file and it will re-read it as a normal batch file. This is handy if you want to re-do the book or extract a chapter from it; without having to re-scan. More on this handy feature in another tutorial.
Positive Points: An exact
copy (layout) of the book is made and it is searchable.
Negative Points: File size is huge, it is hard to manipulate or
change and annoying to convert to other formats.
RTF / DOC / Word XML Settings

RTF / DOC / Word XML: In the "Format Settings" window select the RTF/DOC/Word XML tab.
Copy the basic settings as seen above, with the exception of "Highlight uncertain characters" if you do not wish to use this function. Saving in PNG or JPEG format is really a matter of personal choice as is the quality of picture resolution. MS Word is quite a powerful editor which can do a whole host of things, however if you do not like MS Word then it can exported (or imported) to Star Writer and/or Open Office.
Note: RTF seems to make larger files than if you save in the MS DOC format. I also noticed in FineReader that if you saved in rtf and then opened the file in MS Word converting it to doc (in Word 2002) a much smaller file resulted without any loss of quality.
Positive Points: It can retain special formatting (bold, italics coloured text etc) and pictures.
Negative Points: If pictures are included the file size can be
quite big.

6# From the Tools menu select "Options..." and then the Recognition tab. Copy my settings above. Maybe the most important is the "Autodetect layout" which does a fine job of differentiating between two pages in the same scan. Sometimes it will read across the top of the page and put both page headers next to each other. Besides that minor detail, I have never had any other problems with FineReader differentiating between the facing pages.

7# This step is optional and is really dependent upon how you intend to scan and the type of book you will be scanning. In the Process menu select "Start Background Recognition." You need to select this every new session as this setting is not retained like all the others we've discussed. With this selected, the text will be interpreted from previous pages while you continue to scan the next pages. This can save 20 or more minutes of waiting while it "reads" the entire book if you don't do this first. However if you want to select specific zones or read some areas in a certain order, you will not want background recognition.
Time to put a book on the scanner

8# Grab an average-sized paperback book, one that will fit on the scanner open in portrait mode. If you are able to do this (providing the book is small enough) it is supposed to make the OCR a bit easier. If the book is larger then turn it around so that both pages fit onto the scanner.
Note: You can set the height setting for the scanner which will save some time if scanning smaller books. The lamp moves all the way down the bed past the book and to the bottom. The extra space at the bottom is wasted space and naturally takes extra time to scan. If you want to do this (its up to you) see step 3# (in the options dialogue) and make sure that the Height is at least 6.5 and the Width 8.5 (inches) which is an average sized paperback book size. Experiment a bit to get the size right for your own book. Also note that to go along with this special height select "Custom" for Paper size.
9# Click the (1) "Scan" button and now the scanner will start imaging your first page of the book.
10# Click the (2) "Read." button. Now you should be watching the page be read.
When it is done, the text will appear in a column on the right side of the screen. Adjust the center pane until you can see all the text. Ignore any highlighted characters for the moment. Now Scan a few more test pages.
Look over some of the pages if you like. Click on the page icons on the left and the main window will update with the picture of the text (at the bottom of the main FineReader window) and the processed text (to the far right of the main FineReader Window).
If you quit FineReader and then start it right back up, everything should still be there. The batches are saved automatically. This shows you how easily you can stop and pick up later where you left off.
After you have experimented a bit with the scanning and reading phases of FineReader it is time to do some spell checking.
11# Click the (3) "Check Spell" button. Spell checking is fairly much the same as in any ordinary word processor. The one advantage in first doing a spell check in FineReader is that when a suspicious word comes up you are able to check it directly in the image pane at the bottom of the main FR window.
|
![]() |
12# Click the (4) "Save" Button. A box like the one on the right will appear.

13# Save wherever you want and name the file whatever you desire, but make sure your settings are similar to mine. The above example is a scan being saved as a html file. If you have set up your format settings correctly (point 5#) all your settings (for text, rtf, pdf etc will be saved.
Note: you can click the "Format Settings" button if you want to make a last minute change, (If you set up step 5# correctly there is no need to do so) also the "Save as type" option can be clicked to change formats quickly or save in multiple formats.
And that is all there is. We'll be looking forward to your first "new" book post soon!
© 2003 http://ebook.23ae.com/