View Single Post
Old 06-20-2014, 04:20 PM   #4
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
What you want to do is go into "Tools - Language Editor":

Click image for larger version

Name:	Step1ToolsLanguageEditor.png
Views:	295
Size:	67.6 KB
ID:	124407

Select "New...":

Click image for larger version

Name:	Step2LanguageEditor.png
Views:	281
Size:	7.9 KB
ID:	124408

Create a new language based on an existing language:

Click image for larger version

Name:	Step3NewLanguage.png
Views:	263
Size:	3.3 KB
ID:	124409

And under Alphabet, you want to toss in a bunch of the accented characters you see throughout your book at the very end of the list:

Śśāīṛṣ

Click image for larger version

Name:	Step4LanguageProperties.png
Views:	280
Size:	5.5 KB
ID:	124410

I typically just copy/paste characters off of these Wiki pages (they are highly organized and very easy to visualize there):

https://en.wikipedia.org/wiki/Macron
https://en.wikipedia.org/wiki/Grave_accent
https://en.wikipedia.org/wiki/Acute_accent
https://en.wikipedia.org/wiki/Diaeresis_%28diacritic%29
https://en.wikipedia.org/wiki/Circumflex
https://en.wikipedia.org/wiki/Caron
https://en.wikipedia.org/wiki/Dot_%28diacritic%29
https://en.wikipedia.org/wiki/Tilde

You definitely want to iron out any sort of language/alphabet choices BEFORE you start mass OCRing the book. Because if you get halfway through the book, and finally notice Finereader is missing every single ā, depending on how many times that character occurs in the book, it might be extremely painful to go back and fix all of those manually.

If you swap languages halfway through, Finereader will complain and want to reOCR the entire thing under its new settings.

Side Note: I actually never ran across a book with so many (odd) accents, so I never actually tackled an OCR using this method. The books I convert just have the usual common English, German, French, Spanish accents.

I would probably err on the side of caution and insert AS FEW of these odd characters as possible. The OCR might become highly inaccurate if you start adding in too many. (For example, the bottom of the letter 'g' quite often swings close to the letter on the line below. It MAY mistake that as a different character with a caron/macron above it, etc. etc.).

As to the accuracy of characters with dots above/below, I don't know, I have never run across it in a book I had to OCR. The only one I can recall is one person's name with a capital I with a dot above it 'İ' (I believe it is used in Turkish?). I just manually inserted those whenever his name was mentioned.
Tex2002ans is offline   Reply With Quote