MobileRead Forums - View Single Post - Text selection in pdf is not working for Sanskrit

jonnyl · 02-04-2025, 02:34 PM

This is a known and open bug, as reported here:

https://github.com/koreader/koreader/issues/12738

It occurs when there is no text-layer and KOReader attempts to do OCR but can't find the tessdata files. You can prevent the crashes by copying the 'eng.traineddata' file to your koreader/data/tessdata folder. It is available here for example: https://github.com/tesseract-ocr/tessdata

If you actually want to use OCR on your Sanskrit book, then you will also need to get the 'san.traineddata' file and copy it to the same folder. In addition, you will need to add the "san" language code in your koreader/defaults.custom.lua configuration file, under DKOPTREADER_CONFIG_DOC_LANGS_CODE = . (You will of course also need to have a Sanskrit dictionary installed and I'm not sure how accurate/useful the OCR will be.)

You can find more information here: https://github.com/koreader/koreader...ionary-support
Section: "Dictionary lookups in scanned pages"

02-04-2025, 02:34 PM	#3
jonnyl Zealot Posts: 137 Karma: 33084 Join Date: Jan 2021 Device: Likebook Mars	This is a known and open bug, as reported here: https://github.com/koreader/koreader/issues/12738 It occurs when there is no text-layer and KOReader attempts to do OCR but can't find the tessdata files. You can prevent the crashes by copying the 'eng.traineddata' file to your koreader/data/tessdata folder. It is available here for example: https://github.com/tesseract-ocr/tessdata If you actually want to use OCR on your Sanskrit book, then you will also need to get the 'san.traineddata' file and copy it to the same folder. In addition, you will need to add the "san" language code in your koreader/defaults.custom.lua configuration file, under DKOPTREADER_CONFIG_DOC_LANGS_CODE = . (You will of course also need to have a Sanskrit dictionary installed and I'm not sure how accurate/useful the OCR will be.) You can find more information here: https://github.com/koreader/koreader...ionary-support Section: "Dictionary lookups in scanned pages"