Quote:
Originally Posted by KevinH
And "bestseller" "automobile" and "airplane" are missing from the en_GB one.
So both dictionaries Sigil embeds are a bit out of date but the en_GB one seems a tad extreme 
|
Hmmm... Yep, just tested those words in Sigil too and they worked for me.
Definitely looked like a British dictionary thing. (Which is why everyone should be using the superior American stuff!!!

)
Side Note: "airplane" is marked as a level 2 variant in SCOWL's en_GB. This means "uncommon variant".
Quote:
Originally Posted by KevinH
Please do send me your lists and I will incorporate the fixes.
|
Would probably be better to grab the
latest "size 60" en_GB right from SCOWL. (As of today, it's 2020.12.07.)
* * *
Another good list to look at is LanguageTool's exception file:
These are usually user-submitted reports that aren't in hunspell's lists (at the time). BUT, big but, many of these are:
- Acronyms
- Company/Celebrity/Town names
- Specific/obscure technical terms
- Computing ("stylesheet" or "endianness")
- Biology ("zooxanthella")
- Chemistry ("oxaloacetate")
- Legal/Latin terms
- "ad nauseam" (to a sickening or excessive degree)
- "nauseam" isn't an English word.
- "ab ovo" (from the beginning)
- "ovo" isn't an English word.
- Words from much rarer dictionaries.
- Medical/Legal dictionaries.
Adding these, willy nilly, to a general spellcheck list is not the greatest idea.
This is why you have personal "Add to User Dictionary" or "Ignore All".
Side Note: And LanguageTool can be a little more lenient with their ignored red squigglies, because they're checking
grammar. So they mostly don't want the spelling squigglies interfering with their grammarchecking squigglies!
Quote:
Originally Posted by Ashjuk
Egyptologist
|
I looked in SCOWL.
size 50 (this means EXTREMELY common words)
- Egypt
- Egyptian
- Egyptology
size 70 (this means rarer than normal)
- Egyptologies
size 80 (this means incredibly rare)
- Egyptological
- Egyptologist
From
Google n-grams of "Egyptology" vs. "Egyptologist" vs. "Egyptologies"...
I'd say Egyptologist can probably be moved down to 70 + Egyptologies can be moved to 80! (So this was probably an error.)
I'll submit a bug report.
Edit: Just submitted it. It's
Issue #341.
Quote:
Originally Posted by Ashjuk
bestseller
|
size 40
- bestseller
This word is so common it exists in every dictionary ever.
Definitely something is odd. I think KevinH nailed it, the en-GB included in Sigil is way out of whack.
Quote:
Originally Posted by BetterRed
Outside the US/Canada 'bestseller' would more often be 'best-seller' or, IMO the better yet, 'best seller'.
|
Looks like "bestseller" became the most popular in:
- ~1975 British
- ~1996 American
- "bestseller" and "best-seller" were used ~50/50, but the hyphen usage dropped dramatically.
All 3 variants seem to still be in popular usage, with "best seller" + "best-seller" now in the minority.
Quote:
Originally Posted by BetterRed
Maybe what's needed is a universal English dictionary and exclusion lists that can be applied selectively depending on context. When I'm wearing my copy editors guise I'd prefer exclusion checking be separate from spell checking.
|
That's why there's variants for:
- US English for American spellings.
- GB English for British spellings.
- AU English for Australian terms/words.
Trying to smush that all into a single, monolithic English spellcheck list would cause way more errors.
Then you have another layer, grammarchecking, for things like multi-word or context-level corrections.
Like your example of "motor car" is valid, but a grammarchecker might ping it if set to US English... and tell you:
"Are you sure you meant this? This is a British term."
Similar with
Indian English ("Hinglish")... there are so many words/terms/phrases that make absolutely no sense to normal English-speakers. Like this "famous" saying:
- "Do the needful"
- Do what's needed.
- Do what needs to be done.
- Do what's required.
Quote:
Originally Posted by Ashjuk
stylesheet ... (one you would expect to be included in any dictionary bundled with Sigil)
|
Ehhh, I'd still argue against this one.
Again, the red squiggly spellchecking is meant to check text in ebooks.
In almost all books, besides ones talking about HTML/CSS/XML, "stylesheet" isn't a valid word.
Quote:
Originally Posted by KevinH
@Tex2002ans - Thanks for the links. I am aware of scowl and Kevin Atkinson's Aspell from my days at running the OpenOffice lingucomponent project and as my role as creator of MySpell and MyThes way back then. I understand the concept of a working set (corpus) of most commonly used words and the problems of a larger corpus hiding common mispellings.
|

* * *
Anyway, Ashjuk, I'd also be interested in seeing this 2000 word list. You may have caught some words that could definitely use tweaking.
And if we get them tweaked directly in the source lists, that would benefit
everyone who relies on these lists (which includes LibreOffice, Firefox, and of course, the great Sigil

).