Quote:
Originally Posted by annoporci
The file is quite large at about 70MB. I wonder if there's anything I could do to reduce its size. Any suggestions?
|
By default, KindleGen will attach the source files. Use the
-dont_append_source paramater to change this behavior.
Quote:
Originally Posted by annoporci
Is there an open source mono-lingual look-up dictionary in html/xhtml format that I could look at?
|
AFAIK, very few Open Source dictionaries contain inflections. If you manage to DeDRM the free Merriam Webster dictionary (B00OLDL0BA) that eInk Kindle owners can download, you could use the
KindleUnpack Calibre plugin to unpack it.
Also, many of the older Mobipocket .prc dictionaries contain inflections. (The dictionary format hasn't changed that much.)
Quote:
Originally Posted by annoporci
It turns out that "ca" and "cat" are both valid codes for "Catalan".
|
AFAIK, KindleGen will only use the first two letters of the language code.
Quote:
Originally Posted by annoporci
I still need to properly code "inflections" and clean a few things up, but that may have to wait the upcoming second covid lockdown.
|
Google Open Source Catalan POS (part-of-speech) taggers. There might be one whose data files you could reformat and use to add inflections.