![]() |
#16 | |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 495
Karma: 356531
Join Date: Jul 2016
Location: 'burta, Canada
Device: Kobo Glo HD
|
Quote:
So I don't know. I might have used different source files or did something more to them (I don't think it was just --flatten-synonyms; it didn't make the file sizes grow enough.). This was all last year and I didn't take notes and deleted my build directories to save space. My bash history doesn't go back that far either. Oh, well. At least you have my end result. Let me know what you think. |
|
![]() |
![]() |
![]() |
#17 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,805
Karma: 7423683
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
|
Note that dictutil is not intended to be as comprehensive as Penelope. It will just be for generating, unpacking, packing, testing (maybe), and installing Kobo dictionaries. Converting between formats (other than dicthtml and dictutil's own format) is out of scope.
For people using Penelope, be aware that prefix generation is incorrect in some cases, like when words have special characters after the first two characters, or for some accented letters. Last edited by geek1011; 01-23-2020 at 09:02 PM. |
![]() |
![]() |
![]() |
#18 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 495
Karma: 356531
Join Date: Jul 2016
Location: 'burta, Canada
Device: Kobo Glo HD
|
How do you work around/fix that, if it's even possible? Last year, I was trying to create or combine some Japanese dictionaries, and while individual dictionaries would work, combining them sometimes would and sometimes wouldn't depending on the combination and I couldn't find a pattern. I think I wasted a month trying to figure it out, lol. But since there are so many Japanese characters to keep track of, a prefix problem sounds like a plausible cause. I know there are some penelope command line options that deal with prefix, but there aren't any usage examples in the help text so I don't know how to use them.
|
![]() |
![]() |
![]() |
#19 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,805
Karma: 7423683
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
|
Quote:
I'll publish my notes on my other findings around the same time I finish dictutil. |
|
![]() |
![]() |
![]() |
#20 | ||||||
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 495
Karma: 356531
Join Date: Jul 2016
Location: 'burta, Canada
Device: Kobo Glo HD
|
Quote:
Quote:
Quote:
Code:
if is_ok: prefix = headword[0:length] Code:
if is_ok: prefix = headword[0:2] Quote:
Code:
def is_allowed(character): # all non-ascii (x > 127) are ok # all ASCII lowercase letters (97 <= x <= 122) are ok # everything else is not ok try: code = ord(character) return (code > 127) or ((code >= 97) and (code <= 122)) except: pass return True Quote:
Quote:
Also, does dictutil handle Kobo dictionary synonyms properly? Penelope doesn't even touch it, and stuff like the Japanese dictionaries rely heavily on synonyms because the language uses three alphabets (Kanji, hiragana and katakana) and a complex word can be spelt with any combination (ex. All kanji, all hiragana, all katakana, a mix of kanji and hiragana, or even in Latin letters via romaji) and so it leans heavily on synonyms so that there aren't separate/duplicate entries for each different way to spell a word. While many of the stock Kobo dictionaries are encrypted, the Japanese ones currently aren't, if you'd like to take a peek at how they use synonyms (I actually managed to convert the Progressive EN-JA kobo dictionary to Stardict XML format, which considering I don't know regular expressions or XSLT at all (I did it with a lot of find/replace in Notepad++, lol), I was amazed that it even worked and immediately sought to merge it with entries from the open-source JMDict project, albeit an older version (using --flatten-synonyms, of course); in fact, part of my struggle was in creating an updated JMDict version with 2019 data because my Kobo wouldn't recognize what I made no matter what I did, which in hindsight, might probably be due to " " appearing in some headwords. So yeah, it'd be nice to fix penelope if possible). Last edited by rtiangha; 01-24-2020 at 05:36 AM. |
||||||
![]() |
![]() |
![]() |
#21 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,805
Karma: 7423683
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
|
Quote:
And to check if it is a Unicode letter, use isalpha. No, that last part is incorrect, as only the first two characters are considered. But, something like "a" would to into "aa.html". You might also need to change the order of some of the checks to match. I'll give more details once I finish dictutil. Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#22 | |||
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 495
Karma: 356531
Join Date: Jul 2016
Location: 'burta, Canada
Device: Kobo Glo HD
|
Quote:
Quote:
Quote:
But for just going Stardict->Kobo, I almost think it'd be easier to decompile the dictionary into Stardict XML format and then use XSLT to transform the definitions to Kobo format. And I think you'd be able to go in the opposite direction using that method as well. Or maybe something like that ends up being a totally different utility instead. Last edited by rtiangha; 01-24-2020 at 10:13 AM. |
|||
![]() |
![]() |
![]() |
#23 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,805
Karma: 7423683
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
|
Quote:
Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#24 |
Member
![]() Posts: 12
Karma: 10
Join Date: Nov 2019
Device: Kobo Libra
|
Thanks for the detailed answers guys. So I am seeing two viable candidates here.
rtiangha's 90mb version and Owl's 55 mb. Going off file sizes, the most thorough is rtiangha's? Ill add both for the time being and report back on their performance. |
![]() |
![]() |
![]() |
#25 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 914
Karma: 275656
Join Date: Jun 2016
Device: Kobo
|
If combine dictionaries, there will be a lot of repetitions (identical words and similar descriptions). Therefore, the larger the size of the dictionary is not the fact that it is better. It is necessary to check the result. Watch and search for words.
|
![]() |
![]() |
![]() |
#26 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,198
Karma: 4027538
Join Date: May 2014
Device: Kobo Aura, Mini, Touch, Amazon Kindle.
|
Quote:
best wishes koboy ![]() |
|
![]() |
![]() |
![]() |
#27 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 76
Karma: 10742
Join Date: Jul 2017
Location: Serbia
Device: Kobo Aura One
|
I'm a bit late to the party, but I'll chime in, as I'm a bit of a dictionary nerd and quite like fiddling with them on my Kobo.
I believe that Kindles use the ODE (Oxford Dictionary of English), which is not the OED (Oxford English Dictionary), nor is it based on that venerable enterprise. It's a modern single-volume dictionary that OUP produced twenty-odd years ago. The dictionary that is based on the OED is called the Shorter Oxford English Dictionary (SOED) and, unlike the OED, is much more frequently updated in print form and is currently in its sixth edition. That being said, it is possible to get the (more or less) complete OED on a Kobo, and it works just fine speed-wise, the size and scope don't seem to be an issue. That's certainly the most comprehensive dictionary available, though it's not exactly the most user-friendly. I removed the quotations from the one I converted, and it can still get a bit prickly to navigate through the data. Some references can still have dozens of Kobo popup pages, which is far from ideal. Perhaps it's due to the way in which I made it, but the SOED is a much more practical solution in most situations. I haven't used the ODE for a while, but I remember the SOED being similar to it, yet more comprehensive due to the OED parentage. Another middle-ground would be the equally venerable Webster's Third International. Not as comprehensive as the OED, but the entries can be a good deal shorter. For all its advantages and somewhat legendary cultural status, Webster's Third suffers from occasional stodginess. If you'd like a chuckle, look up Webster's definition of "door". Though most of its quirks were expunged from the modern, online-only edition, the spirit of Webster's remains in those defining moments. Outside of these, I should probably mention American Heritage Dictionary and Random House Webster's Unabridged (not related to Webster's Third). I have a habit of not using them, though they are very robust, modern dictionaries with no-nonsense definitions, usually well-formulated and clear, if a bit short on examples. For the literary-minded, a good choice is also Webster's 1912 dictionary - though it took me a while to track down a version that didn't omit the best bits - the extensive quotations. For a quick look-up, the best path is probably something like Roget's 21st Century Thesaurus, that simply lists out synonyms rather intelligently. Most of these I have on me, either in a format that Kobo will read or in a digital form that might be converted for Kobo, so feel free to hit me up for anything. |
![]() |
![]() |
![]() |
#28 |
BLAM!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,506
Karma: 26047202
Join Date: Jun 2010
Location: Paris, France
Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E
|
@Alanon: I am mildly intrigued
![]() |
![]() |
![]() |
![]() |
#29 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,805
Karma: 7423683
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
|
For anyone interested, I've made a mingw-w64 build of marisa-trie for Windows. I've attached it to this post.
P.S. dictutil is almost ready, I just haven't been working on it for the last week and a bit. |
![]() |
![]() |
![]() |
#30 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,252
Karma: 16544692
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Quote:
![]() Last edited by jackie_w; 02-01-2020 at 12:51 PM. |
|
![]() |
![]() |
![]() |
Tags |
dictionary, kobo |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom Chinese-English dictionary | tshering | Kobo Developer's Corner | 64 | 01-28-2025 05:51 PM |
Custom Japanese-English dictionary | tshering | Kobo Developer's Corner | 55 | 10-13-2018 09:43 AM |
Dictionary plugin in Sigil? For example Oxford-English Dictionary. | Rindr | Plugins | 2 | 03-04-2018 11:11 AM |
English-English Dictionary for 301 | LevAizik | PocketBook | 6 | 12-03-2013 09:42 PM |
PB302 - How to replace English->Russian dictionary with English only (with defin.)? | guyanonymous | PocketBook | 29 | 08-03-2010 06:05 PM |