MobileRead Forums - View Single Post - Abbyy Finereader 15 gothic/Fraktur Altdeutsch/Oldgerman

famfam · 12-29-2020, 04:59 PM

Quote:

Originally Posted by Tex2002ans

1. Under Document Language, you want to select the dropdown, then "More Languages...".

2. Choose "Specify Languages Manually", then check the checkboxes for which languages you want to detect:

For example, I use this:

Code:

English; German; French;

This allows Finereader to detect ç, or other accented characters.

Note: Don't go too overboard with languages though. Finereader uses this to look up dictionary words + add certain letters in the alphabet. The more languages you add, the more likely there will be false positives.

For example, "der" is a German word, but isn't an English word, so an English OCR error like "un der" will be considered okay (since it'll think it's German).

german:
Ich hatte so gedacht:
Wenn der Haupttext des Buches in Altdeutsch ist, dann nehme ich Altdeutsch in OCR. Wenn nun im Text weiteren Sprachen und Schriften verwendet werden, dann füge ich generell die weiteren Sprachen zur OTR-Liste hinzu. Und damit starte ich den Erkennungsprozess für das gesamte Buch. Band für Band. Bei 4 Bänden kommt man leicht auf 2000 Seiten. Dass das nicht funtioniert ist doch wohl eine Schwäche von FR 15 oder? Ich verstehe nicht, wo das Problem ist, FR 15 auf diese Höchstleistung zu bringen. Eigentlich müsste doch möglich sein, ein Programm zu machen, dass den Text Wort für Wort liest, und bei jedem Wort automatisch die Sprache und Schrift und erkennt, und das richtige Wörterbuch zuordnet. Dann müsste das Programm die Erkennungsdiagnose in eine Liste schreiben oder für jede Seite so eine Liste schreiben. Dann braucht das Programm beim letzen OCR-Durchlauf nur anhand der am Anfang geschriebenen Liste oder Listen zu übersetzen. In den Listen steht doch drin, welches Wörterbuch für welches Wort zuständig ist. Ist das alles wirklich so viel komplizierter als ich mir das denke?
english:
I had thought like this: If the main text of the book is in Old German, I'll use Old German in OCR. If other languages and fonts are used in the text, then I generally add the other languages to the OTR list. And with that I start the recognition process for the entire book. Band by band. With 4 volumes you can easily get to 2000 pages. That it doesn't work is a weakness of FR 15, isn't it? I don't understand where the problem is getting FR 15 up to this peak. It should actually be possible to make a program that reads the text word for word and automatically recognizes the language and script for every word, and finds the riht dictionary for every word. Then the program would have to write the detection diagnosis in a list or write such a list for each page. Then the program only needs to translate for the last OCR run using the list or lists written at the beginning. The lists say which dictionary is responsible for which word. Is it really all that much more complicated than I think?