Marco Pinto is the one who takes care of most en_GB lists nowadays:
https://github.com/marcoagpinto/aoo-mozilla-en-dict
From a quick look at his dictionary though, he
also tends towards including nearly every word under the sun.
He also seems to be releasing monthly updates. (Compared to SCOWL's much slower, but thoroughly vetted releases.)
Another nice thing is his changelogs show exactly which words were added when:
https://raw.githubusercontent.com/ma...LO_2013%2B.txt
Quote:
Originally Posted by KevinH
@Ashjuk,
One more question. It seems the en_GB can be built to support "ise" endings (ala The Times), or "ize" endings (ala the OED), or both.
Which would be best for general purpose use in Sigil?
|
From everything I've seen over the years:
- -ise is the default. (en-GB)
- -ize (Oxford's endings) are a valid alternate though. (en-GB-oed OR en-GB-oxendict)
- Typically optional. May or may not exist in most programs.
- Note: Often, both -ise + -ize is smashed together into a single British. This is subpar + would lead to many more missed typos. See details below.
See:
especially the specs for:
This was out of BCP47:
Quote:
Grandfathered tags that do not match the 'langtag' production in the ABNF and would otherwise be invalid are considered 'irregular' grandfathered tags. With the exception of "en-GB-oed", which is a variant of "en-GB", each of them, in its entirety, represents a language.
Many of the grandfathered tags have been superseded by the subsequent addition of new subtags: each superseded record contains a 'Preferred-Value' field that ought to be used to form language tags representing that value. For example, the tag "art-lojban" is superseded by the primary language subtag 'jbo'.
|
and this was out of the IETF Language Subtag Registry:
Quote:
Type: grandfathered
Tag: en-GB-oed
Description: English, Oxford English Dictionary spelling
Added: 2003-07-09
Deprecated: 2015-04-17
Preferred-Value: en-GB-oxendict
|
Note: Marking that HTML lang within an EPUB though... I don't know how well supported non-region subtags are at all on actual devices. (I'm not aware of anyone testing them thoroughly, but I doubt they actually work well.)
- - -
From everything that I can recall, what typically happens across programs/apps is...
When you select your language, you'd have the big 2 choices:
1. English (American)
2. English (British)
-- -ise
Beyond that point, programs might include many of the main variants (Australian, Canadian, etc.).
... and then (very rarely included by default):
- English (Oxford/OED)
-- -ize
- - -
LibreOffice has theirs listed as:
- English (US)
- English (UK)
- [... All the other country variants...]
- English, OED Spelling (UK)
- - -
Word 2016 only has:
- English (United States)
- English (United Kingdom)
- [... All the other country variants...]
No Oxford by default.
(No clue if this has changed in newer versions. I believe if you wanted Oxford dict, you'd have to grab third party dictionaries.)
From a quick test, it looks like "British" Word may accept all -ise + -ize endings. (But I think that's a poor idea. Again, see SCOWL with popularity+usage+levels-of-accepted-variants.)
- - -
Antidote, when you're selecting between English, gives 4 options:
- American English
- British English (-ise)
- British English (Oxford: -ize)
- Canadian English
- (They're a Canadian-based company + they were originally started as a French grammarchecker, now have expanded into French+English.)
Note: I agree strongly with this separation. When trying to spellcheck/proof
actual texts, books typically stick with a single spelling variant throughout (based on author/publisher location + Style Guide).
Mashing all endings together will cause you to MISS inconsistencies within a single text.
So Sigil, if deciding to go with the big 2 + Oxford, should:
- analyze analyse
- analyze analyse (en-US)
- analyze analyse (en-GB)
- analyze analyse (Oxford)
- organization organisation
- organization organisation (en-US)
- organization organisation (en-GB)
- organization organisation (Oxford)
- realize realise
- realize realise (en-US)
- realize realise (en-GB)
- realize realise (Oxford)
Quote:
Originally Posted by Ashjuk
[...] from just the 'A's alone there are so many glaring errors that I really don't think it is fit for purpose. I kid you not there are entries like annefrank which surely should be Anne Frank.
[...]
I've no idea who edits the LibreOffice dictionaries but if this is an example of what they are like then I would not advise anyone to use them.
|
Woof. I never took a closer look.
And that's also why it's important to... thoroughly double-check against real-life popularity/usage.
(Not like that guy in the Reddit post who said "Why not just accept everything from Wiktionary?" !!!)
Definitely report many of those errors to Marco's github and get those fixed!