Pocketbook dictionary format revisted - Page 25

Moonbase59 · 07-14-2025, 01:07 PM

Pity it’s not open source, or at least supported by Pyglossary…

crypteronia · 08-19-2025, 05:41 PM

Hi Markismus!

I tried my best to convert a StarDict to .dic using your script, though i'm very ignorant of coding and such!

For some reason (though i did turn off the variable in the script) it won't launch because i don't have Tesseract... then i try to install tesseract, why not, but then because i'm on mac i have to go through HomeBrew, and that crashes for some different reason, and i don't need to do OCR in the first place, so the whole thing just feels extra silly =P

So, if you have the time, i would love a .dic version of a big, modern Spanish-> English dictionary, such as a wiktionary-based StarDict, like this one:

https://github.com/doozan/spanish_data/releases

or this one ("es-en" in the list):

https://download.wikdict.com/dictionaries/stardict/

(I'm not sure if there are any significant differences, they're both based on wiktionary)

thank you so much for your amazing work <3

Markismus · 08-21-2025, 07:53 AM

@cryperonia There is a control variable $isConvertImagesUsingOCR which can be set to 0 to disable OCR and tesseract. I've added it to the module DicControls.pm to make it more accessible. You can get the changed script on github.

For those that actually want to make OCR work, you should both install the perl library Image::OCR::Tesseract and tesseract on your system and configure them if they don't work out-of-the-box. In lot of dictionaries conversions in the past there are images embedded that are nothing more than unrecognized symbols. The subrountine convertIMG2Text does what it says on the box.
Another control variable for this function $isManualValidation allows you to toggle between checking manually whether the Tesseract got it right and correcting or just going along with whatever Tesseract generates.

Markismus · 08-21-2025, 01:33 PM

I've uploaded the converted dictionary files to pCloud in the SPA-ENG folder. Keep in mind, though, that the synonyms are not converted. So for Stardict users the original file will be more powerful.

crypteronia · 08-21-2025, 05:31 PM

oh wow thank you so so much @Markismus!! that's so lovely of you.

I admit I was pulling my hair trying to get your script to work haha.

For the record, first I converted the StarDict file to CSV using pyglossary. then i ran into the issue i mentioned, with the $isConvertImagesUsingOCR variable (your fix didn't help, I did set the variable to 0 in the DicControls script as well but pocketbookdic.pl was still clamouring for the tesseract module... So i went into DicConversion and savagely deleted the entire "sub" responsible for OCR conversion lmao. that worked)

but then i ran into a new bug involving $isRemoveBreakTag in DicControls.pm. Apparently it needed to be declared? So i added a line at the top of the file to declare it. that seemed to fix it.

then, finally it seemed the script was starting to run; but something jammed again. fwiw i got these 3 messages:

Code:

DicConversion.pm line 930 in function Dic2Screen::die2
DicPrepare.pm line 277 in function DicConversion::convertCVStoXDXF
pocketbookdic.pl line 110 in function DicPrepare::loadXDXF

at that point i decided, ok, i'll let your program handle the whole conversion. but that meant i now had to install stardict-tools.

suffice it to say, after several hours and many open tabs, i was about to give up... and then i saw you'd posted the dic file <3 <3 and, it works!

so.. yeah! thanks a million =)

RomanP. · 08-22-2025, 04:24 AM

Hi,
Could anyone convert this KOReader dictionary into *.dic format for PocketBook e-reader, please?

Thank you in advance!

Markismus · 08-22-2025, 04:00 PM

I've included the synonyms now as new entries pointing to the original form, e.g.

Code:

<ar>
<head><k>-adora</k></head><def>⟶ -ador</def>
</ar>

And I created a toggle $addSynonyms to toggle it on/off.

The Pocketbook binary dictionary increased from 4MB to 16MB. Not so odd if you realize that uncompressed Stardict dictionary is 30MB and the synonym file is 45MB. Still, I had hoped for around 10MB.
The Wiktionary 2025 ES-EN dictionaries now haw 2.5M entries, due to all the synonyms.

Markismus · 08-22-2025, 04:17 PM

@RomanP. The dic-files are in the ENG-SLO and SLO-SLO directory.

RomanP. · 08-22-2025, 06:26 PM

Thanks, and thank you so much for your amazing work. <3

ichnilatis · 08-30-2025, 10:07 AM

Dear @Markismus, I have uploaded some dictionaries to the link below. Could they please be converted to dict format for KOReader?

Thank you in advance!

https://u.pcloud.link/publink/show?c...6r3MjFyBzMLLik

Markismus · 09-14-2025, 06:27 AM

@ichnilatis No. They have the dsl-extension. You can convert them with Illius' pyglossary.

ichnilatis · 09-15-2025, 07:46 AM

Never mind. Thank you!
But since I haven't the knowledge to use this program I will ask in another thread if anyone could do me this favour.

Markismus · 09-15-2025, 10:45 AM

You could also read the documentation.

Ghostcat · 10-28-2025, 04:58 AM

Quote:

Originally Posted by rkomar

Official dictionaries are stored in a separate filesystem and mounted under "/mnt/secure". The default user account does not have access to that filesystem. You would need to be root or user "sreader" to be able to read and backup files in there. .

Exactly how do you mount the "secure" filesystem? I have run fdisk and param on /dev/sda (for a Inkpad 4) and there appears to be only one partition.

That said, I know there has to be a hidden filesystem as thing like the application binaries don't appear in the mounted fs.

rkomar · 10-28-2025, 02:50 PM

I don't know about the latest firmwares, but the older ones all had a similar partition layout that was visible when you ran fdisk on the storage device. If it is changed on your device, then my older comment doesn't apply anymore. I am a bit surprised if that's true, though, because firmware updates used to wipe out the static partitions but left the dynamic partitions with the user data (mounted under /mnt/secure and /mnt/ext1) alone. Perhaps the device location changed in newer versions? Maybe /dev/mmcblk? You should run the "mount" command to see which partitions or files are being used and where the mount points are.

08-21-2025, 05:31 PM	#365
crypteronia Junior Member Posts: 2 Karma: 10 Join Date: Aug 2025 Device: vivlio touch lux 5 (pocketbook)	oh wow thank you so so much @Markismus!! that's so lovely of you. I admit I was pulling my hair trying to get your script to work haha. For the record, first I converted the StarDict file to CSV using pyglossary. then i ran into the issue i mentioned, with the $isConvertImagesUsingOCR variable (your fix didn't help, I did set the variable to 0 in the DicControls script as well but pocketbookdic.pl was still clamouring for the tesseract module... So i went into DicConversion and savagely deleted the entire "sub" responsible for OCR conversion lmao. that worked) but then i ran into a new bug involving $isRemoveBreakTag in DicControls.pm. Apparently it needed to be declared? So i added a line at the top of the file to declare it. that seemed to fix it. then, finally it seemed the script was starting to run; but something jammed again. fwiw i got these 3 messages: Code: DicConversion.pm line 930 in function Dic2Screen::die2 DicPrepare.pm line 277 in function DicConversion::convertCVStoXDXF pocketbookdic.pl line 110 in function DicPrepare::loadXDXF at that point i decided, ok, i'll let your program handle the whole conversion. but that meant i now had to install stardict-tools. suffice it to say, after several hours and many open tabs, i was about to give up... and then i saw you'd posted the dic file <3 <3 and, it works! so.. yeah! thanks a million =)

08-22-2025, 04:00 PM	#367
Markismus Guru Posts: 963 Karma: 149907 Join Date: Jul 2013 Location: Rotterdam Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura	I've included the synonyms now as new entries pointing to the original form, e.g. Code: <ar> <head><k>-adora</k></head><def>⟶ -ador</def> </ar> And I created a toggle $addSynonyms to toggle it on/off. The Pocketbook binary dictionary increased from 4MB to 16MB. Not so odd if you realize that uncompressed Stardict dictionary is 30MB and the synonym file is 45MB. Still, I had hoped for around 10MB. The Wiktionary 2025 ES-EN dictionaries now haw 2.5M entries, due to all the synonyms.

09-15-2025, 10:45 AM	#373
Markismus Guru Posts: 963 Karma: 149907 Join Date: Jul 2013 Location: Rotterdam Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura	You could also read the documentation. Last edited by Markismus; 09-16-2025 at 02:59 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Pocketbook dictionary	logan	PocketBook	322	03-05-2024 10:48 AM
Dictionary coversion from .mobi to pocketbook format?	doctorat	PocketBook	16	07-01-2020 06:34 PM
Webster's 1913 Dictionary in Pocketbook Format	luqmaninbmore	PocketBook	8	05-27-2020 11:41 AM
SW>EN Dictionary for Pocketbook	tttrine	PocketBook	3	06-09-2015 07:01 AM

07-14-2025, 01:07 PM	#361
Moonbase59 Addict Posts: 234 Karma: 1000244 Join Date: Oct 2021 Location: Germany Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)	Pity it’s not open source, or at least supported by Pyglossary…

08-19-2025, 05:41 PM	#362
crypteronia Junior Member Posts: 2 Karma: 10 Join Date: Aug 2025 Device: vivlio touch lux 5 (pocketbook)	Hi Markismus! I tried my best to convert a StarDict to .dic using your script, though i'm very ignorant of coding and such! For some reason (though i did turn off the variable in the script) it won't launch because i don't have Tesseract... then i try to install tesseract, why not, but then because i'm on mac i have to go through HomeBrew, and that crashes for some different reason, and i don't need to do OCR in the first place, so the whole thing just feels extra silly =P So, if you have the time, i would love a .dic version of a big, modern Spanish-> English dictionary, such as a wiktionary-based StarDict, like this one: https://github.com/doozan/spanish_data/releases or this one ("es-en" in the list): https://download.wikdict.com/dictionaries/stardict/ (I'm not sure if there are any significant differences, they're both based on wiktionary) thank you so much for your amazing work <3

08-21-2025, 07:53 AM	#363
Markismus Guru Posts: 963 Karma: 149907 Join Date: Jul 2013 Location: Rotterdam Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura	@cryperonia There is a control variable $isConvertImagesUsingOCR which can be set to 0 to disable OCR and tesseract. I've added it to the module DicControls.pm to make it more accessible. You can get the changed script on github. For those that actually want to make OCR work, you should both install the perl library Image::OCR::Tesseract and tesseract on your system and configure them if they don't work out-of-the-box. In lot of dictionaries conversions in the past there are images embedded that are nothing more than unrecognized symbols. The subrountine convertIMG2Text does what it says on the box. Another control variable for this function $isManualValidation allows you to toggle between checking manually whether the Tesseract got it right and correcting or just going along with whatever Tesseract generates.

08-21-2025, 01:33 PM	#364
Markismus Guru Posts: 963 Karma: 149907 Join Date: Jul 2013 Location: Rotterdam Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura	I've uploaded the converted dictionary files to pCloud in the SPA-ENG folder. Keep in mind, though, that the synonyms are not converted. So for Stardict users the original file will be more powerful.

08-22-2025, 04:17 PM	#368
Markismus Guru Posts: 963 Karma: 149907 Join Date: Jul 2013 Location: Rotterdam Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura	@RomanP. The dic-files are in the ENG-SLO and SLO-SLO directory.

08-22-2025, 06:26 PM	#369
RomanP. Junior Member Posts: 2 Karma: 10 Join Date: Aug 2025 Device: PocketBook	Thanks, and thank you so much for your amazing work. <3

08-30-2025, 10:07 AM	#370
ichnilatis Groupie Posts: 179 Karma: 1686 Join Date: Jul 2020 Location: Greece Device: Pocketbook Touch Lux 5	Dear @Markismus, I have uploaded some dictionaries to the link below. Could they please be converted to dict format for KOReader? Thank you in advance! https://u.pcloud.link/publink/show?c...6r3MjFyBzMLLik

09-14-2025, 06:27 AM	#371
Markismus Guru Posts: 963 Karma: 149907 Join Date: Jul 2013 Location: Rotterdam Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura	@ichnilatis No. They have the dsl-extension. You can convert them with Illius' pyglossary.

09-15-2025, 07:46 AM	#372
ichnilatis Groupie Posts: 179 Karma: 1686 Join Date: Jul 2020 Location: Greece Device: Pocketbook Touch Lux 5	Never mind. Thank you! But since I haven't the knowledge to use this program I will ask in another thread if anyone could do me this favour.

10-28-2025, 02:50 PM	#375
rkomar Wizard Posts: 3,079 Karma: 18821071 Join Date: Oct 2010 Location: Sudbury, ON, Canada Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633	I don't know about the latest firmwares, but the older ones all had a similar partition layout that was visible when you ran fdisk on the storage device. If it is changed on your device, then my older comment doesn't apply anymore. I am a bit surprised if that's true, though, because firmware updates used to wipe out the static partitions but left the dynamic partitions with the user data (mounted under /mnt/secure and /mnt/ext1) alone. Perhaps the device location changed in newer versions? Maybe /dev/mmcblk? You should run the "mount" command to see which partitions or files are being used and where the mount points are.