Quote:
Originally Posted by geek1011
Here's all the logic used to generate the prefixes. I've tested it against libnickel, and it's also based on the disassembly of DictionaryParser::htmlForWord (it was slightly annoying that most of the Qt stuff was inlined). I've simplified it and improved the performance, and also left in the original code for reference.
I'll write some proper documentation and make a thread when I finish dictutil later.
Here is the code: https://sourcegraph.com/github.com/g...util.go#L30-86
And here are some useful notes about v1/v2 dictionaries (this hasn't ever been discussed, or even noticed before AFAIK): https://pgaskin.net/dictutil/dicthtml/v1v2
|
Wow, good work. I knew of the difference in dictionary versions, but wasn't sure what exactly made them different.
Looking forward to seeing your documentation on the dictionary format and the definition of the various tags. Am wondering if there are any obscure tags that I have yet to encounter.
I'm curious about Kanji: Will dictutil be able to handle those properly? Just wondering since the code says it's a special case.
One of my main interests is in making
bilingual Japanese word lists of, say my Anki flash card deck or one of my Japanese textbooks (in say csv or TAB file format) to help simplify definitions to my reading level and to keep the definitions consistent to what I'm learning and may be tested on later, but Japanese in particular has given me the most problems with dictionaries made with Penelope sometimes working and sometimes not. I suppose I could write and tag and sort into various files my own version manually, but I'd like to avoid that, if possible.
And I believe that tshering discovered that kanji look up only really works properly when using one of the built in Japanese language dictionaries (either jaaxdis, en-ja or en-ja-pgs), especially if you're not using the Japanese locale (in order to bring up the Japanese keyboard, I guess; not sure how it works with Chinese now that it is a supported language but word highlight/look up still seems fine regardless of OS language) because it may use a different function compared to the other languages. Is that still the case, and if so, is it possible (maybe through a patch?) to make Kanji lookup work regardless of the dictionary selected (for example, in order to have more than 3 different Japanese-related dictionaries installed)? At the very least, I want to create a ja-en dictionary, and while I'm using
norbusan's utility to enhance the built in jaxxdis dictionary, I really would love to create an updated one based on JMDict or this random Kenkyuusha one that somehow made its way into my possession (cough).