Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > PocketBook > PocketBook Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 03-07-2026, 06:06 AM   #391
nhedgehog
Guru
nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.nhedgehog ought to be getting tired of karma fortunes by now.
 
Posts: 833
Karma: 628976
Join Date: Sep 2013
Device: EnergySistemEreaderPro, Nook STG, Pocketbook 622, Bookeen Cybooks ...
Very cool, thanks for the effort and sharing it!
nhedgehog is offline   Reply With Quote
Old 03-07-2026, 10:37 AM   #392
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,111
Karma: 18944169
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Nice work, @datyoma! I'm a linux guy, and it was a pain running the PocketBook dictionary converter in a windows VM. It will be nice to have a native tool for that.
rkomar is offline   Reply With Quote
Old 03-07-2026, 11:54 AM   #393
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 971
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura
Nice!
Markismus is offline   Reply With Quote
Old 03-07-2026, 04:19 PM   #394
datyoma
Junior Member
datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.
 
Posts: 4
Karma: 94894
Join Date: Mar 2026
Location: Berlin
Device: PocketBook InkPad 4
@rkomar so am I, the only relatively sane way I found to run the converter.exe was by building a Docker image with wine32 in it.

Code:
FROM debian:bookworm-slim

RUN dpkg --add-architecture i386 && \
    apt-get update && apt-get install -y --no-install-recommends wine wine32

WORKDIR /data
ENTRYPOINT ["wine", "converter.exe"]
So in case there's an issue, the debugging workflow is as follows:
- run the original converter (docker run -v $(pwd):/data pb-converter input.xdxf meta/), rename the output file to input.orig.dic
- run the new converter (./pbdt convert input.xdxf --meta-dir=meta/)
- run ./pbdt show sdic index on both files
- run ./pbdt show sdic block <file.dic> <offset> on two similar blocks (lists the keys) and feed it into vimdiff or the like to compare block contents
- finally, there's ./pbdt lookup --debug command for checking raw definition bytes

The converted files are very similar, but usually not byte-to-byte identical due to:
- non-stable sorting of words with the same collated key in the original converter
- minor differences in selecting block boundaries
- treatment of special XML characters - I didn't pay much attention to this
- not yet discovered bugs
datyoma is offline   Reply With Quote
Old 03-09-2026, 06:10 AM   #395
pLEX
Junior Member
pLEX began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2026
Device: Pocketbook Verse Pro
Quote:
Originally Posted by Markismus View Post
@pLEX i ran the 3 mobi-files through the script and none was correctly parsed.
Hi
Thank you for your efforts!

I'll try lurking among internal communities, maybe someone has UA-UA dictionary for PocketBook.
pLEX is offline   Reply With Quote
Old Today, 05:59 AM   #396
datyoma
Junior Member
datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.
 
Posts: 4
Karma: 94894
Join Date: Mar 2026
Location: Berlin
Device: PocketBook InkPad 4
I used some of my vacation time to further dive into this topic, now with help of Copilot Pro and Ghidra MCP server. (GPT 5.4 is my new favorite for reverse engineering.) These are the highlights:

HTML support is quite good. Dictionary definitions are rendered using Qt 6 text widgets. The supported tags and inline styles are documented here: https://doc.qt.io/qt-6/richtext-html-subset.html. Even images can be rendered, it turns out. There's one caveat: the e-reader converts literal newlines (\n) to <br> because of backwards compatibility reasons, so it's crucial to remove them when building the dictionary to avoid extra whitespace.

Morphems section (morphems.txt) is completely ignored. There are several stemming engines:
- Hunspell is the preferred one, in case the language package is installed
- if not, https://github.com/Blake-Madden/OleanderStemmingLibrary is used for supported languages (see its readme for the list)
- there's also this obscure library, mostly for Eastern European languages: https://github.com/izacus/SlovenianLemmatizer (see /ebrmain/config/lemmagen)

Now you might be wondering how the source language is determined. First, there are dictionaries that are downloaded from PB servers. The metadata for them is stored in /mnt/ext1/system/pbdicts/pbdicts.db (SQLite). Second, there's an optional JSON metadata section, which is practically never used. Finally, there's a fallback which looks at the first line of keyboard.txt, finds the two letters before ':' and treats them as locale, e.g. "EN: English" -> EN. So it's mighty important to bake in the correct keyboard.txt if you want morphology to work properly.

The optional JSON metadata section allows fancy rendering of dictionary info: it has fields "name", "localeFrom", "localeTo", "description", "provider" (aka publisher/issuer, e.g. Wiktionary), "category" (e.g. universal), and a few more fields that don't seem to be used anywhere ("version", "specialProject", "set").
____________________________________

The CLI tool and the WASM UI (https://datyoma.codeberg.page/pbdt/v1/) now support:
- merging multiple input dictionaries when converting to .dic
- reading and writing JSON metadata section

I also removed legacy conversion of <i></i> and <b></b> tags to binary, as is the case with converter.exe; literal newlines are also stripped when writing and converted to <br> when reading (see above), so input XDXF must use <br> for newlines.

Apart from that, PyGlossary now has a PocketBook output plugin (also HTML-native, so to say): https://github.com/ilius/pyglossary/pull/708. PyGlossary supports lots of input formats, and is much easier to discover than this thread.
datyoma is offline   Reply With Quote
Old Today, 03:26 PM   #397
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,111
Karma: 18944169
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
What do you mean that morphems.txt is ignored? Ignored by the converter, ignored by the PB dictionary code on the device,...?
rkomar is offline   Reply With Quote
Old Today, 05:44 PM   #398
datyoma
Junior Member
datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.datyoma has top level security clearance to Area 51.
 
Posts: 4
Karma: 94894
Join Date: Mar 2026
Location: Berlin
Device: PocketBook InkPad 4
It's stored in the dictionary, but ignored by PB dictionary reading code on the device.

Code:
$ readelf -CsW cramfs/lib/libdictionary.so | grep comparatorForQueryWord
334: ... pocketbook::morphology::OlenderStemSearcher::comparatorForQueryWord(...) const
392: ... pocketbook::dictionary::ChainingSearcher::comparatorForQueryWord(...) const
487: ... pocketbook::dictionary::CombinedSearcher::comparatorForQueryWord(...) const
611: ... pocketbook::dictionary::ExactSearcher::comparatorForQueryWord(...) const
614: ... pocketbook::morphology::HunspellSearcher::comparatorForQueryWord(...) const
920: ... pocketbook::morphology::LemmaGenSearcher::comparatorForQueryWord(...) const
The chaining searcher tries Hunspell, OlenderStem, LemmaGen and exact searchers in that order.
The combined searcher seems to be dead code; it is supposed to dispatch to other searchers based on locale, but there are no calls to constructor nor creation of that mapping.
There's no trace of morphems.txt being utilised anywhere, in modern firmware at least.
datyoma is offline   Reply With Quote
Old Today, 05:49 PM   #399
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,111
Karma: 18944169
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Okay, thanks for the explanation. Perhaps it is just on old devices that it is used.

Edit: I'm glad you found this information out. I had in the back of my mind a project to produce morphems.txt rules from hunspell rules, but I see now that that would be a waste of time on modern devices.

Last edited by rkomar; Today at 05:51 PM.
rkomar is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Pocketbook dictionary logan PocketBook 324 02-13-2026 02:19 PM
Dictionary coversion from .mobi to pocketbook format? doctorat PocketBook 16 07-01-2020 05:34 PM
Webster's 1913 Dictionary in Pocketbook Format luqmaninbmore PocketBook 8 05-27-2020 10:41 AM
SW>EN Dictionary for Pocketbook tttrine PocketBook 3 06-09-2015 06:01 AM


All times are GMT -4. The time now is 06:46 PM.


MobileRead.com is a privately owned, operated and funded community.