View Single Post
Old 01-05-2022, 09:13 PM   #80
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Ashjuk View Post
Just a thought, but is there any chance of bundling a more comprehensive English dictionary with Sigil?

My default user dictionary now contains nearly 2000 words (all checked as being valid and correct English against two on-line dictionaries) that are not included in the dictionary that comes with Sigil.
Ehh... I'd be very careful with that.

Isn't hunspell's English dictionary already based off of SCOWL?

I wrote about that in detail in a recent LibreOffice thread on Reddit:

The person was complaining "how abysmal" LibreOffice's default dictionary is... and I had to explain why it wasn't.

What would be better, for you, is you may want to generate a larger custom dictionary from SCOWL using their online tool. (Or download the "en_US-large" dictionary instead.)

Default spellcheck lists use the "size 60" lists. These are common words that are found in most dictionaries.

If you go to "size 70", this includes rarer words, but the potential to miss actual typos.

"size 80" includes incredibly rare, but still valid, English words.

For example, I gave these rare examples in the post above:
  • cherishable
  • cheesewood
  • classfellow
  • clotes
  • clubroot
  • pollusion

Most people probably meant:
  • cherish + able
  • cheese + wood
  • class + fellow
  • clothes
  • club + root
  • pollution

Quote:
Originally Posted by Ashjuk View Post
I realise that you just pick up the Hunspell ones, but I was wondering if there was a better alternative that updated their dictionaries on a more regular basis.
You can go straight to the source.

If you want to handle both American/British variants? Generate a list.

Want to handle technical computer+"hacker" words? Generate a list.

Want -ize British endings instead of -ise? Generate a list!

Words actually missing from all the lists? Submit them to the Github. (Like I submitted the latest atomic elements... such as 118 = "Oganesson".)

Quote:
Originally Posted by KevinH View Post
If you are sure that your 2000 long wordlist are all valid en_US words then please zip up the list, post it someplace
I'd also be interested in it too. Last year, I spent a few months scraping all dictionaries for all valid English words.

I submitted a huge load of "missing words" to SCOWL to research.

Sometimes common words are accidentally in the rarer lists, and can be adjusted downwards too. Language usage is always changing.

Like this newer word: "crowdsource" + "crowdsourced" + "crowdsourcing". I just got those fixed up in LanguageTool! When someone accidentally types "crowd source", they probably meant "crowdsource"!

Last edited by Tex2002ans; 01-05-2022 at 09:28 PM.
Tex2002ans is offline   Reply With Quote