|
|
#391 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 833
Karma: 628976
Join Date: Sep 2013
Device: EnergySistemEreaderPro, Nook STG, Pocketbook 622, Bookeen Cybooks ...
|
Very cool, thanks for the effort and sharing it!
|
|
|
|
|
|
#392 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,111
Karma: 18944169
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
Nice work, @datyoma! I'm a linux guy, and it was a pain running the PocketBook dictionary converter in a windows VM. It will be nice to have a native tool for that.
|
|
|
|
|
|
#393 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 971
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura
|
Nice!
|
|
|
|
|
|
#394 |
|
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4
Karma: 94894
Join Date: Mar 2026
Location: Berlin
Device: PocketBook InkPad 4
|
@rkomar so am I, the only relatively sane way I found to run the converter.exe was by building a Docker image with wine32 in it.
Code:
FROM debian:bookworm-slim
RUN dpkg --add-architecture i386 && \
apt-get update && apt-get install -y --no-install-recommends wine wine32
WORKDIR /data
ENTRYPOINT ["wine", "converter.exe"]
- run the original converter (docker run -v $(pwd):/data pb-converter input.xdxf meta/), rename the output file to input.orig.dic - run the new converter (./pbdt convert input.xdxf --meta-dir=meta/) - run ./pbdt show sdic index on both files - run ./pbdt show sdic block <file.dic> <offset> on two similar blocks (lists the keys) and feed it into vimdiff or the like to compare block contents - finally, there's ./pbdt lookup --debug command for checking raw definition bytes The converted files are very similar, but usually not byte-to-byte identical due to: - non-stable sorting of words with the same collated key in the original converter - minor differences in selecting block boundaries - treatment of special XML characters - I didn't pay much attention to this - not yet discovered bugs |
|
|
|
|
|
#395 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Mar 2026
Device: Pocketbook Verse Pro
|
|
|
|
|
|
|
#396 |
|
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4
Karma: 94894
Join Date: Mar 2026
Location: Berlin
Device: PocketBook InkPad 4
|
I used some of my vacation time to further dive into this topic, now with help of Copilot Pro and Ghidra MCP server. (GPT 5.4 is my new favorite for reverse engineering.) These are the highlights:
HTML support is quite good. Dictionary definitions are rendered using Qt 6 text widgets. The supported tags and inline styles are documented here: https://doc.qt.io/qt-6/richtext-html-subset.html. Even images can be rendered, it turns out. There's one caveat: the e-reader converts literal newlines (\n) to <br> because of backwards compatibility reasons, so it's crucial to remove them when building the dictionary to avoid extra whitespace. Morphems section (morphems.txt) is completely ignored. There are several stemming engines: - Hunspell is the preferred one, in case the language package is installed - if not, https://github.com/Blake-Madden/OleanderStemmingLibrary is used for supported languages (see its readme for the list) - there's also this obscure library, mostly for Eastern European languages: https://github.com/izacus/SlovenianLemmatizer (see /ebrmain/config/lemmagen) Now you might be wondering how the source language is determined. First, there are dictionaries that are downloaded from PB servers. The metadata for them is stored in /mnt/ext1/system/pbdicts/pbdicts.db (SQLite). Second, there's an optional JSON metadata section, which is practically never used. Finally, there's a fallback which looks at the first line of keyboard.txt, finds the two letters before ':' and treats them as locale, e.g. "EN: English" -> EN. So it's mighty important to bake in the correct keyboard.txt if you want morphology to work properly. The optional JSON metadata section allows fancy rendering of dictionary info: it has fields "name", "localeFrom", "localeTo", "description", "provider" (aka publisher/issuer, e.g. Wiktionary), "category" (e.g. universal), and a few more fields that don't seem to be used anywhere ("version", "specialProject", "set"). ____________________________________ The CLI tool and the WASM UI (https://datyoma.codeberg.page/pbdt/v1/) now support: - merging multiple input dictionaries when converting to .dic - reading and writing JSON metadata section I also removed legacy conversion of <i></i> and <b></b> tags to binary, as is the case with converter.exe; literal newlines are also stripped when writing and converted to <br> when reading (see above), so input XDXF must use <br> for newlines. Apart from that, PyGlossary now has a PocketBook output plugin (also HTML-native, so to say): https://github.com/ilius/pyglossary/pull/708. PyGlossary supports lots of input formats, and is much easier to discover than this thread. |
|
|
|
|
|
#397 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,111
Karma: 18944169
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
What do you mean that morphems.txt is ignored? Ignored by the converter, ignored by the PB dictionary code on the device,...?
|
|
|
|
|
|
#398 |
|
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4
Karma: 94894
Join Date: Mar 2026
Location: Berlin
Device: PocketBook InkPad 4
|
It's stored in the dictionary, but ignored by PB dictionary reading code on the device.
Code:
$ readelf -CsW cramfs/lib/libdictionary.so | grep comparatorForQueryWord 334: ... pocketbook::morphology::OlenderStemSearcher::comparatorForQueryWord(...) const 392: ... pocketbook::dictionary::ChainingSearcher::comparatorForQueryWord(...) const 487: ... pocketbook::dictionary::CombinedSearcher::comparatorForQueryWord(...) const 611: ... pocketbook::dictionary::ExactSearcher::comparatorForQueryWord(...) const 614: ... pocketbook::morphology::HunspellSearcher::comparatorForQueryWord(...) const 920: ... pocketbook::morphology::LemmaGenSearcher::comparatorForQueryWord(...) const The combined searcher seems to be dead code; it is supposed to dispatch to other searchers based on locale, but there are no calls to constructor nor creation of that mapping. There's no trace of morphems.txt being utilised anywhere, in modern firmware at least. |
|
|
|
|
|
#399 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,111
Karma: 18944169
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
Okay, thanks for the explanation. Perhaps it is just on old devices that it is used.
Edit: I'm glad you found this information out. I had in the back of my mind a project to produce morphems.txt rules from hunspell rules, but I see now that that would be a waste of time on modern devices. Last edited by rkomar; Today at 05:51 PM. |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Pocketbook dictionary | logan | PocketBook | 324 | 02-13-2026 02:19 PM |
| Dictionary coversion from .mobi to pocketbook format? | doctorat | PocketBook | 16 | 07-01-2020 05:34 PM |
| Webster's 1913 Dictionary in Pocketbook Format | luqmaninbmore | PocketBook | 8 | 05-27-2020 10:41 AM |
| SW>EN Dictionary for Pocketbook | tttrine | PocketBook | 3 | 06-09-2015 06:01 AM |