![]() |
#361 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 228
Karma: 1000244
Join Date: Oct 2021
Location: Germany
Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)
|
Pity it’s not open source, or at least supported by Pyglossary…
|
![]() |
![]() |
![]() |
#362 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Aug 2025
Device: vivlio touch lux 5 (pocketbook)
|
Hi Markismus!
I tried my best to convert a StarDict to .dic using your script, though i'm very ignorant of coding and such! For some reason (though i did turn off the variable in the script) it won't launch because i don't have Tesseract... then i try to install tesseract, why not, but then because i'm on mac i have to go through HomeBrew, and that crashes for some different reason, and i don't need to do OCR in the first place, so the whole thing just feels extra silly =P So, if you have the time, i would love a .dic version of a big, modern Spanish-> English dictionary, such as a wiktionary-based StarDict, like this one: https://github.com/doozan/spanish_data/releases or this one ("es-en" in the list): https://download.wikdict.com/dictionaries/stardict/ (I'm not sure if there are any significant differences, they're both based on wiktionary) thank you so much for your amazing work <3 |
![]() |
![]() |
![]() |
#363 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 959
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
@cryperonia There is a control variable $isConvertImagesUsingOCR which can be set to 0 to disable OCR and tesseract. I've added it to the module DicControls.pm to make it more accessible. You can get the changed script on github.
For those that actually want to make OCR work, you should both install the perl library Image::OCR::Tesseract and tesseract on your system and configure them if they don't work out-of-the-box. In lot of dictionaries conversions in the past there are images embedded that are nothing more than unrecognized symbols. The subrountine convertIMG2Text does what it says on the box. Another control variable for this function $isManualValidation allows you to toggle between checking manually whether the Tesseract got it right and correcting or just going along with whatever Tesseract generates. ![]() |
![]() |
![]() |
![]() |
#364 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 959
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
I've uploaded the converted dictionary files to pCloud in the SPA-ENG folder. Keep in mind, though, that the synonyms are not converted. So for Stardict users the original file will be more powerful.
|
![]() |
![]() |
![]() |
#365 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Aug 2025
Device: vivlio touch lux 5 (pocketbook)
|
oh wow thank you so so much @Markismus!! that's so lovely of you.
I admit I was pulling my hair trying to get your script to work haha. For the record, first I converted the StarDict file to CSV using pyglossary. then i ran into the issue i mentioned, with the $isConvertImagesUsingOCR variable (your fix didn't help, I did set the variable to 0 in the DicControls script as well but pocketbookdic.pl was still clamouring for the tesseract module... So i went into DicConversion and savagely deleted the entire "sub" responsible for OCR conversion lmao. that worked) but then i ran into a new bug involving $isRemoveBreakTag in DicControls.pm. Apparently it needed to be declared? So i added a line at the top of the file to declare it. that seemed to fix it. then, finally it seemed the script was starting to run; but something jammed again. fwiw i got these 3 messages: Code:
DicConversion.pm line 930 in function Dic2Screen::die2 DicPrepare.pm line 277 in function DicConversion::convertCVStoXDXF pocketbookdic.pl line 110 in function DicPrepare::loadXDXF suffice it to say, after several hours and many open tabs, i was about to give up... and then i saw you'd posted the dic file <3 <3 and, it works! so.. yeah! thanks a million =) |
![]() |
![]() |
![]() |
#366 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Aug 2025
Device: PocketBook
|
Hi,
Could anyone convert this KOReader dictionary into *.dic format for PocketBook e-reader, please? Thank you in advance! |
![]() |
![]() |
![]() |
#367 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 959
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
I've included the synonyms now as new entries pointing to the original form, e.g.
Code:
<ar> <head><k>-adora</k></head><def>⟶ -ador</def> </ar> The Pocketbook binary dictionary increased from 4MB to 16MB. Not so odd if you realize that uncompressed Stardict dictionary is 30MB and the synonym file is 45MB. Still, I had hoped for around 10MB. The Wiktionary 2025 ES-EN dictionaries now haw 2.5M entries, due to all the synonyms. |
![]() |
![]() |
![]() |
#368 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 959
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
@RomanP. The dic-files are in the ENG-SLO and SLO-SLO directory.
|
![]() |
![]() |
![]() |
#369 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Aug 2025
Device: PocketBook
|
Thanks, and thank you so much for your amazing work. <3
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Pocketbook dictionary | logan | PocketBook | 322 | 03-05-2024 09:48 AM |
Dictionary coversion from .mobi to pocketbook format? | doctorat | PocketBook | 16 | 07-01-2020 05:34 PM |
Webster's 1913 Dictionary in Pocketbook Format | luqmaninbmore | PocketBook | 8 | 05-27-2020 10:41 AM |
SW>EN Dictionary for Pocketbook | tttrine | PocketBook | 3 | 06-09-2015 06:01 AM |