Quote:
Originally Posted by Ashjuk
Just a thought, but is there any chance of bundling a more comprehensive English dictionary with Sigil?
My default user dictionary now contains nearly 2000 words (all checked as being valid and correct English against two on-line dictionaries) that are not included in the dictionary that comes with Sigil.
|
Ehh... I'd be very careful with that.
Isn't hunspell's English dictionary already based off of SCOWL?
I wrote about that in detail in a recent LibreOffice thread on Reddit:
The person was complaining "how abysmal" LibreOffice's default dictionary is... and I had to explain why it wasn't.
What would be better, for you, is you may want to generate a
larger custom dictionary from SCOWL using their online tool. (Or download the "en_US-large" dictionary instead.)
Default spellcheck lists use the "size 60" lists. These are common words that are found in most dictionaries.
If you go to "size 70", this includes rarer words, but the potential to miss actual typos.
"size 80" includes incredibly rare, but still valid, English words.
For example, I gave these rare examples in the post above:
- cherishable
- cheesewood
- classfellow
- clotes
- clubroot
- pollusion
Most people
probably meant:
- cherish + able
- cheese + wood
- class + fellow
- clothes
- club + root
- pollution
Quote:
Originally Posted by Ashjuk
I realise that you just pick up the Hunspell ones, but I was wondering if there was a better alternative that updated their dictionaries on a more regular basis.
|
You can go straight to the source.
If you want to handle both American/British variants? Generate a list.
Want to handle technical computer+"hacker" words? Generate a list.
Want -ize British endings instead of -ise? Generate a list!
Words actually missing from all the lists? Submit them to the Github. (Like I submitted
the latest atomic elements... such as 118 = "Oganesson".)
Quote:
Originally Posted by KevinH
If you are sure that your 2000 long wordlist are all valid en_US words then please zip up the list, post it someplace
|
I'd also be interested in it too. Last year, I spent a few months scraping all dictionaries for all valid English words.
I submitted a huge load of "missing words" to SCOWL to research.
Sometimes common words are accidentally in the rarer lists, and can be adjusted downwards too. Language usage is always changing.
Like this newer word: "crowdsource" + "crowdsourced" + "crowdsourcing". I just got those fixed up in LanguageTool! When someone accidentally types "crowd source", they probably meant "crowdsource"!