View Single Post
Old 08-21-2025, 06:53 AM   #363
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 959
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
@cryperonia There is a control variable $isConvertImagesUsingOCR which can be set to 0 to disable OCR and tesseract. I've added it to the module DicControls.pm to make it more accessible. You can get the changed script on github.

For those that actually want to make OCR work, you should both install the perl library Image::OCR::Tesseract and tesseract on your system and configure them if they don't work out-of-the-box. In lot of dictionaries conversions in the past there are images embedded that are nothing more than unrecognized symbols. The subrountine convertIMG2Text does what it says on the box.
Another control variable for this function $isManualValidation allows you to toggle between checking manually whether the Tesseract got it right and correcting or just going along with whatever Tesseract generates.
Markismus is offline   Reply With Quote