Quote:
Originally Posted by ichnilatis
So, do I have to make this correction?
-- document languages for OCR
DKOPTREADER_CONFIG_DOC_LANGS_TEXT = {"English", "Ancient Greek"}
DKOPTREADER_CONFIG_DOC_LANGS_CODE = {"eng", "grc"} -- language code, make sure you have corresponding training data
DKOPTREADER_CONFIG_DOC_DEFAULT_LANG_CODE = "eng" -- that have filenames starting with the language codes
|
Something like that, yes. If you want to keep it, make sure to put it in persistent.defaults.lua.
Quote:
From the screenshot you sent I conclude that the breathings (᾿ ῾), the circumflex (῀) and the grave accent (`) are not recognized... and some letters 
Can this problem be solved?
|
It's probably much less of a problem in non-italic text, but unless you have a slightly higher DPI original document not really. A newer version of Tesseract might also do slightly better.