MobileRead Forums - View Single Post

Frenzie · 12-30-2020, 06:02 AM

Quote:

Originally Posted by ichnilatis

So, do I have to make this correction?

-- document languages for OCR
DKOPTREADER_CONFIG_DOC_LANGS_TEXT = {"English", "Ancient Greek"}
DKOPTREADER_CONFIG_DOC_LANGS_CODE = {"eng", "grc"} -- language code, make sure you have corresponding training data
DKOPTREADER_CONFIG_DOC_DEFAULT_LANG_CODE = "eng" -- that have filenames starting with the language codes

Something like that, yes. If you want to keep it, make sure to put it in persistent.defaults.lua.

Quote:

From the screenshot you sent I conclude that the breathings (᾿ ῾), the circumflex (῀) and the grave accent (`) are not recognized... and some letters

Can this problem be solved?

It's probably much less of a problem in non-italic text, but unless you have a slightly higher DPI original document not really. A newer version of Tesseract might also do slightly better.