View Single Post
Old 07-12-2020, 07:55 AM   #20
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,748
Karma: 24032915
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by annoporci View Post
The file is quite large at about 70MB. I wonder if there's anything I could do to reduce its size. Any suggestions?
By default, KindleGen will attach the source files. Use the -dont_append_source paramater to change this behavior.

Quote:
Originally Posted by annoporci View Post
Is there an open source mono-lingual look-up dictionary in html/xhtml format that I could look at?
AFAIK, very few Open Source dictionaries contain inflections. If you manage to DeDRM the free Merriam Webster dictionary (B00OLDL0BA) that eInk Kindle owners can download, you could use the KindleUnpack Calibre plugin to unpack it.
Also, many of the older Mobipocket .prc dictionaries contain inflections. (The dictionary format hasn't changed that much.)

Quote:
Originally Posted by annoporci View Post
It turns out that "ca" and "cat" are both valid codes for "Catalan".
AFAIK, KindleGen will only use the first two letters of the language code.

Quote:
Originally Posted by annoporci View Post
I still need to properly code "inflections" and clean a few things up, but that may have to wait the upcoming second covid lockdown.
Google Open Source Catalan POS (part-of-speech) taggers. There might be one whose data files you could reformat and use to add inflections.
Doitsu is offline   Reply With Quote