01-21-2022, 04:09 AM | #31 | |
Fanatic
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
|
Quote:
|
|
01-21-2022, 10:38 AM | #32 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
@Ashjuk,
Does the current Sigil en_GB dictioanry support "ise" or "ize" or both? I will do what the current Sigil en_GB dictionary does in that regard following the rule of least surprise. Thanks! |
01-21-2022, 12:08 PM | #33 |
Fanatic
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
|
Kevin,
As far as I am aware 'ise' is the default for the current en_GB dictionary. When I do a spellcheck if the book is set to English - Great Britain in the metadata 'ize' is normally picked up as misspelled. I have now checked my UK list of words against the Google GB dictionary and have uploaded a new file to my Google drive of those that are still missing. I have also uploaded a complete list of the words in the Google file in alphabetical order as plain text if that is of any use. Checked file - https://drive.google.com/file/d/18C8...ew?usp=sharing Full list - https://drive.google.com/file/d/1bmK...ew?usp=sharing |
01-21-2022, 12:13 PM | #34 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
In your opinion, is the Google en_GB dictionary suitable as a starting point for Sigil (unlike the libreoffice/openoffice ones)?
|
01-21-2022, 12:43 PM | #35 |
Fanatic
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
|
From what I have seen of it I would tentatively say yes. I would hazard a guess that less than 5% of the words it contains were flagged as misspelled by Word, and probably a lot of those are OK being real names and new words.
There are a few problems that I spotted early on that could possibly be addressed. One being the inclusion of Gray. Whilst this is probably meant to be a person's name it is also the US spelling of grey. So if one were to start a sentence with the words "Gray clouds covered the sky" and what you meant to write was "Grey clouds covered the sky" it would not be flagged as misspelt. Also there is one huge error - Scotchman/Scotchwoman. Scotch is a drink! If you were to call a Scotsman a Scotchman I doubt you would be standing long. Hopefully I can find the time to have a better look to see if I can spot any other words that might cause a problem. |
01-21-2022, 01:07 PM | #36 |
Bibliophagist
Posts: 34,557
Karma: 144552660
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Actually, scotchman is correct but it's an old variant of scotsman or scot and can be considered insulting today. Does the dictionary you were looking at also mention the nautical use for scotchman?
|
01-21-2022, 03:02 PM | #37 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
It seems starting with old dictionaries is fraught with danger one way or the other. Based on all of this, and based on scowl being the only one being vetted in a consistent manner, and based on Tex2002ans's comments, I think we should probably stick with scowl plus some obvious additions.
So I am going to create dictionaries based on scowl 60 and 70 with proper accents, with the addition of the checked words Ashjuk found specific to US and UK, and with the new words as well. Once I have those I can post them here and people can evaluate them. If they appear to be a clear improvement over what we have now, I will push them to master. How does that sound to everyone? Last edited by KevinH; 01-21-2022 at 05:08 PM. |
01-21-2022, 04:25 PM | #38 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
new en_* Test Dictionaries
Hi All,
Attached to this post are two zip archives which contain the latest scowl based "en" hunspell dictionaries that have been extended to cover both some verified words and some common proper company product names (iPhone, etc). The en_scowl_size_60.zip has the en_US, en_CA, en_AU, en_GB, and en_GB-oed .aff and .dic files based on scowl size 60. Similarly the en_scowl_size_70.zip has the en_US, en_CA, en_AU, en_GB and en_GB-oed .aff and .dic files. If you have a chance please give these a try and let me know of any issues you run into. Special thanks to Ashjuk for checking so many words for both GB and US dictionaries and posting them so we could improve our internal Sigil hunspell dictionaries. Here are the number of "root word" entries for all of these dictionaries and "total words" covered when counting every unique string. size_60 -------- Code:
- en_AU: 51106, 125043 + 78 no suggest words - en_CA: 50999, 124839 + 78 no suggest words - en_GB-oed: 50930, 124368 + 78 no suggest words - en_GB: 51527, 125264 + 78 no suggest words - en_US: 51412, 125475 + 78 no suggest words -------- Code:
- en_AU: 81065, 168300 + 78 no suggest words - en_CA: 80888, 168061 + 78 no suggest words - en_GB-oed: 80752, 167543 + 78 no suggest words - en_GB: 81159, 168128 + 78 no suggest words - en_US: 81121, 168592 + 78 no suggest words After restarting Sigil, the dictionaries there will take precedence over the Sigil installed with the same name until you delete them. Edit: Removed the now outdated zipped dictioanries. See later posts in this thread for updated versions. Last edited by KevinH; 01-24-2022 at 11:40 AM. Reason: remove now outdated zipped dictionaries |
01-22-2022, 04:13 AM | #39 | |
Fanatic
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
|
Quote:
Whilst you are correct in saying that it an historical name for Scots it really should not be used these days. Having lived in Scotland for a while I can assure they get extremely offended if you refer to them as Scotchmen. I checked on scotchman and found the nautical reference you mentioned. So perhaps scotchman should be included for that reason, but the Google dictionary had it listed as: Scotchman Scotchmen Scotchwoman Scotchwomen So I assume it is referring to the race and not the nautical use. |
|
01-22-2022, 04:28 AM | #40 |
Fanatic
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
|
@Kevin - Thank you too for all your hard work.
The LibreOffice dictionary is, in my opinion, a complete mess, and the Google one would require a good deal of checking. So basing Sigil's dictionaries on a known vetted source (scowl) is probably the best way forward. I will test out the new dictionaries and report back if I discover any issues. Perhaps we ought to have an annual review where everyone submits a list of verified new words from their user dictionary for inclusion in the next release. |
01-22-2022, 08:52 AM | #41 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
FWIW, we can also remove words from the dictionary or keep them in the dictionary but mark them as "no suggest" if they are now considered offensive.
Everything I have looked at says Scotchman and Scotchwoman (and their variations) are at their worst offensive and at their best obsolete. Unfortunately, they are part of the current scowl wordlists. In fact I think the google dictionaries are probably scowl based. Perhaps someone should open a bug report on the scowl github site and suggest the removal of that word, or raising it to level 80 (lower frequency) so it is no longer part of most spelling dictionaries. Given the word is obsolete at best, perhaps we should remove it or at least set it as no suggest in our dictionaries before their release? All thoughts welcome. Last edited by KevinH; 01-22-2022 at 10:45 AM. |
01-22-2022, 03:53 PM | #42 | |
null operator (he/him)
Posts: 20,459
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Can the end-user set/unset a word to 'no suggest', if so how? Then those who write about cricket for public broadcasters etc could mark 'batsman/men' as 'no suggest' and use 'batter' instead. And thanks a lot for the OED spelling dictionary. ** a shroud is part of the standing rigging that holds a mast aloft. BR Last edited by BetterRed; 01-22-2022 at 04:44 PM. Reason: define shroud |
|
01-22-2022, 06:27 PM | #43 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
There is no easy way for the user to mark something as no suggest.
The only way to do it is to open the .dic file in an editor that accepts utf-8 text with no carriage returns (unix line ends) and add an ! mark to the existing flags for that root word or add /! if no flags exist. That approach will only work with these dictionaries as ! is set as the no suggest flag. |
01-23-2022, 03:58 AM | #44 |
Fanatic
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
|
Given that scotchman has meanings other than that of referring to a male of Scots origin perhaps it would be better to leave that in the dictionary. I doubt it will be encountered much, if at all, so I don't think it's going to be an issue.
As for Scotchmen/Scotchwoman/Scotchwomen. Personally I think they could be removed. |
01-23-2022, 10:44 AM | #45 | |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
A spellchecker dictionary is very different from a regular dictionary. Its role is to help catch common spelling errors, not to define a language or a word. Therefore keeping obsolete or very rarely used words in a spellchecker dictionary is just not appropriate given they help hide spelling errors on more commonly used words.
I will remove both Scotchman, and Scotchwoman and their variants but leave scotchman in the final release. If people have a historical text that uses those words that they do not want to update to their modern equivalents, they can easily ignore those words or simply add them to their User dictionary. This is the same reason a spellchecker dictionary should not be based on scowl size 80 or larger (and many say 70 or larger). Unfortunately, vetting the scowl word lists really requires a team of dedicated people not just one or two. Thanks, KevinH Quote:
|
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sigil newbie dictionary questions | michaelbr | Sigil | 8 | 12-06-2020 09:41 AM |
Content Dictionary update availability | ntamas | Amazon Kindle | 7 | 10-05-2019 01:03 PM |
Dictionary plugin in Sigil? For example Oxford-English Dictionary. | Rindr | Plugins | 2 | 03-04-2018 11:11 AM |
PRS-600 Dictionary not working after firmware update | pakiyabhai | Sony Reader | 1 | 10-24-2009 09:02 PM |
Update Problem and Dictionary Question | barryp | Sony Reader | 8 | 09-22-2008 05:56 AM |