MobileRead Forums - View Single Post

Markismus · 08-21-2025, 07:53 AM

@cryperonia There is a control variable $isConvertImagesUsingOCR which can be set to 0 to disable OCR and tesseract. I've added it to the module DicControls.pm to make it more accessible. You can get the changed script on github.

For those that actually want to make OCR work, you should both install the perl library Image::OCR::Tesseract and tesseract on your system and configure them if they don't work out-of-the-box. In lot of dictionaries conversions in the past there are images embedded that are nothing more than unrecognized symbols. The subrountine convertIMG2Text does what it says on the box.
Another control variable for this function $isManualValidation allows you to toggle between checking manually whether the Tesseract got it right and correcting or just going along with whatever Tesseract generates.

08-21-2025, 07:53 AM	#363
Markismus Guru Posts: 963 Karma: 149907 Join Date: Jul 2013 Location: Rotterdam Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura	@cryperonia There is a control variable $isConvertImagesUsingOCR which can be set to 0 to disable OCR and tesseract. I've added it to the module DicControls.pm to make it more accessible. You can get the changed script on github. For those that actually want to make OCR work, you should both install the perl library Image::OCR::Tesseract and tesseract on your system and configure them if they don't work out-of-the-box. In lot of dictionaries conversions in the past there are images embedded that are nothing more than unrecognized symbols. The subrountine convertIMG2Text does what it says on the box. Another control variable for this function $isManualValidation allows you to toggle between checking manually whether the Tesseract got it right and correcting or just going along with whatever Tesseract generates.