View Single Post
Old 01-19-2022, 02:17 PM   #17
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,948
Karma: 6361444
Join Date: Nov 2009
Device: many
I have built a tentative new en_US .dic and .aff file that is meant to rival what Pages and Word accepts by merging the unmunched hunspell dictionary with the unmunched MySpell dictionary with Ashjuk's new additions for en_US.

I will then build various scowl based (60, 70, 80) wordlists and we can compare them.

I must say KevinA's scowl repo build process is not well designed to say the least. It uses symlinks everywhere which is a major no-no and then strips out all accents so that he can just rename the .aff and .dic to utf-8 when it really is latin-1 based and the accent characters could have been properly converted and kept. I do not want an "eclair" in the wordlist!
So even scowl has its drawbacks. A 6 line python program could have done the conversions from one encoding to another, He has to keep the latin-1 encoded files for munch and unmunch to work (as it needs 1 byte = 1 char rule for munch speed). This was the design to reduce dictionary sizes as many languages are based around one 8-bit encoding, latin-1, latin-2, etc.

Last edited by KevinH; 01-20-2022 at 12:00 PM.
KevinH is offline   Reply With Quote