Pattern Files

If you have ever scanned a book (usually older scholastic works) where foreign characters are used along with double letters (in italics or a slightly different font) you will be extremely frustrated as the OCR simply refuses to recognise the letters correctly no matter what you do. Adding Foreign Characters to the Language editor can help immensely; but even that has its limits. For more on how to add Foreign Character to the Language Editor [Click Here] Pattern files are a great tool when used in conjunction with the Language Editor or even on their own.


What is a Pattern File?

A Pattern file is simply a way in which to teach your OCR to recognise letters that are extremely unusual; like decorative fonts; mathematical symbols; and so on. Here are a few examples of when a Pattern file might be needed:

Book

Single Letter

Combined Letter

     

"d"

"dâ"
  "ft"
"k" and "e" "ke"
"s"  
"t"  

Without a pattern file; examples like those above make the OCR process difficult.


When to use Pattern Files.

Some characters may be so close together that the OCR is unable to tell them apart. This is especially so when a non-standard font is used where a part of one letter is almost touching the next letter.


When to not use Pattern Files

A Pattern file is a tool that is extremely important when scanning a book, however on ordinary books and smaller works (though difficult) it is not worth the extra effort. In general each pattern file is book specific (or series / volume set specific) and you would have to make a separate pattern file per book (or set of books). Another thing to remember is that it slows down the recognition.


Setting up a pattern file

1#: Start FineReader 7 Professional, go to the Tools and click Pattern Editor. You can also use the keyboard shortcut Ctrl+Shift+A (when FineReader is open) to open the Pattern Editor dialogue.

2#: In this box click new. Note: the above example has two pattern files I am using; your box will naturally not have the above files.

3#: In this box; type in a new name for your pattern file; use a name which will remind you as to what book or set of volumes you are working with.

4#: Click the set active button and then close.


Training the Pattern File

5#: In the main FineReader window click the text button shown above.

6#: Zone the the paragraph; page or pages (with a text zone like above) -but not the whole book- where you are having the OCR difficulties. Remember you do not have to zone the entire page; a paragraph may be enough; just be sure to zone the area where the trouble characters are.

7#: Go to tools scroll down and click options. You can also use the keyboard shortcut Ctrl+Shift+O to open the options dialogue while in FineReader.

8#: Click the Recognition Tab; under the Training section click "Train user pattern" make sure "Use built-in patterns" is also ticked and then click OK.

9#: Click the Read button. Note if your button says: "Read All" click the little down arrow on the right side of the button and select "Read" this will ensure that only the current page or selection (and not the whole book) is read.

10#: After clicking the read button a dialogue similar to the one above will appear. It is in here that you will teach the OCR how to read and interpret unusual characters. In the example above I have selected the "Ri" to be recognised by the OCR as one set (called a ligature). In the White box (next to "Train") you enter the letter or letters that the OCR is supposed to see when this comes up.

The buttons decrease / increase the green capture area surrounding the "Ri". You can add single letters (by using the left and right buttons) or double letters (ligatures). Make sure you completely enclose the letter(s) you want recognised within the green box. You can also expand or shrink the green "capture area" by using the mouse to drag the green area in the direction you want.

Click "Train" when you have entered the right character(s).

Effects: Under the "Effects" heading you can select bold, italic etc though I seriously do not recommend using this function as this can cause future recognition problems.

Back (Undo): Use this to undo a mistake. Warning this only goes back one step (the last word) so do not rely on it if you have gone to far ahead. Later on (in the Advanced Tutorial) I will show you how to edit out mistakes from the entire pattern file, and also add other handy things.

If you add double letters a window like the one above will pop up. Just click yes and continue on as before.

When you have finished your reading, a window like the one below will pop up. Note: You can click close at any time, you do not need to wait until the end of reading your selection if you do not want to.

11#: Click yes; which will now save your pattern file for future use.

Important note: each pattern file can not contain more than 1000 new characters.


Using the Pattern File

Now that you have trained your own personal Pattern File to recognize single and double characters that have been causing you troubles; all that remains is for you to use it.

12#: Go to tools scroll down and click options. You can also use the keyboard shortcut Ctrl+Shift+O to open the options dialogue while in FineReader.

13#: In options -> Recognition; under the "Training" heading select "Use user pattern" and then click OK. Note: make sure that your user pattern is set to active (see step #3 again if you forgot how to do this).

And that is all there is to it. Remember to click the "Do not use user patterns" when you are scanning a regular book that does not require your new pattern file.


TOC

© 2003 http://ebook.23ae.com/