Pattern Files
If you have ever scanned a book (usually older scholastic works) where foreign characters are used along with double letters (in italics or a slightly different font) you will be extremely frustrated as the OCR simply refuses to recognise the letters correctly no matter what you do. Adding Foreign Characters to the Language editor can help immensely; but even that has its limits. For more on how to add Foreign Character to the Language Editor [Click Here] Pattern files are a great tool when used in conjunction with the Language Editor or even on their own.
A Pattern file is simply a way in which to teach your OCR to recognise letters that are extremely unusual; like decorative fonts; mathematical symbols; and so on. Here are a few examples of when a Pattern file might be needed:
|
Book |
Single Letter |
Combined Letter |
|
"d" |
"dâ" | |
| "ft" | ||
| "k" and "e" | "ke" | |
| "s" | ||
| "t" |
Without a pattern file; examples like those above make the OCR process difficult.
When you have strange characters like those in the above example
When you have consistent errors with the same letters
When there are special or decorative fonts throughout the book that are not getting recognised
When special symbols are used throughout the book (and are not getting recognised)
When you are scanning a 150 pages or more (any less is not worth the effort)
When you have a series or set of volumes that use the same (or similar) type of font.
Extremely poor quality books or printouts.
Some characters may be so close together that the OCR is unable to tell them apart. This is especially so when a non-standard font is used where a part of one letter is almost touching the next letter.
When the book is less than a 150 pages
When it is an average book (with common OCR errors)
A Pattern file is a tool that is extremely important when scanning a book, however on ordinary books and smaller works (though difficult) it is not worth the extra effort. In general each pattern file is book specific (or series / volume set specific) and you would have to make a separate pattern file per book (or set of books). Another thing to remember is that it slows down the recognition.

1#: Start FineReader 7 Professional, go to the Tools and click Pattern Editor. You can also use the keyboard shortcut Ctrl+Shift+A (when FineReader is open) to open the Pattern Editor dialogue.

2#: In this box click new. Note: the above example has two pattern files I am using; your box will naturally not have the above files.

3#: In this box; type in a new name for your pattern file; use a name which will remind you as to what book or set of volumes you are working with.

4#: Click the set active button and then close.
5#: In the main FineReader window click the text button shown above.

6#: Zone the the paragraph; page or pages (with a text zone like above) -but not the whole book- where you are having the OCR difficulties. Remember you do not have to zone the entire page; a paragraph may be enough; just be sure to zone the area where the trouble characters are.

7#: Go to tools scroll down and click options. You can also use the keyboard shortcut Ctrl+Shift+O to open the options dialogue while in FineReader.

8#: Click the Recognition Tab; under the Training section click "Train user pattern" make sure "Use built-in patterns" is also ticked and then click OK.

9#: Click the Read button. Note if your button says: "Read All" click the little down arrow on the right side of the button and select "Read" this will ensure that only the current page or selection (and not the whole book) is read.

10#: After clicking the read button a dialogue similar to the one above will appear. It is in here that you will teach the OCR how to read and interpret unusual characters. In the example above I have selected the "Ri" to be recognised by the OCR as one set (called a ligature). In the White box (next to "Train") you enter the letter or letters that the OCR is supposed to see when this comes up.
The
buttons decrease /
increase
the green capture area surrounding the "Ri". You can add single letters (by using the
left and right buttons) or double letters (ligatures). Make sure you completely
enclose the letter(s) you want recognised within the green box. You can also
expand or shrink the green "capture area" by using the mouse to drag the green
area in the direction you want.
Click "Train" when you have entered the right character(s).
Effects: Under the "Effects" heading you can select bold, italic etc though I seriously do not recommend using this function as this can cause future recognition problems.
Back (Undo): Use this to undo a mistake. Warning this only goes back one step (the last word) so do not rely on it if you have gone to far ahead. Later on (in the Advanced Tutorial) I will show you how to edit out mistakes from the entire pattern file, and also add other handy things.

If you add double letters a window like the one above will pop up. Just click yes and continue on as before.
When you have finished your reading, a window like the one below will pop up. Note: You can click close at any time, you do not need to wait until the end of reading your selection if you do not want to.

11#: Click yes; which will now save your pattern file for future use.
Important note: each pattern file can not contain more than 1000 new characters.
Now that you have trained your own personal Pattern File to recognize single and double characters that have been causing you troubles; all that remains is for you to use it.

12#: Go to tools scroll down and click options. You can also use the keyboard shortcut Ctrl+Shift+O to open the options dialogue while in FineReader.

13#: In options -> Recognition; under the "Training" heading select "Use user pattern" and then click OK. Note: make sure that your user pattern is set to active (see step #3 again if you forgot how to do this).
And that is all there is to it. Remember to click the "Do not use user patterns" when you are scanning a regular book that does not require your new pattern file.
© 2003 http://ebook.23ae.com/