View Single Post
Old 01-17-2017, 09:40 PM   #12
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,807
Karma: 6000000
Join Date: Nov 2009
Device: many
The "we" here is actually just "me" as I am the one who designed and created the MySpell spell checker that later became the basis for hunspell. At its root, a dictionary is simply a list of all of the words in the working set of a language. To make the list workable in size you end up needing prefix and suffix compression as well as limiting the set of usable letters used in the dictionary. It makes no sense to include every single possessive word in the wordlist twice. So dictionaries standardized on using a normal apostrophe in the wordlist. To spellcheck a word, you make a copy of it and replace any fancy single quotes with an apostrophe so it can be checked efficiently in the dictionary word list. I also produced the first en_US dictionary for MySpell from established wordlists to make that rule work. Numerous others dictionaries have followed that rule. And hunspell inherited that behaviour from MySpell.

It really isn't much of a limitation for efficient dictionary wordlist lookup provided suggestions as simple re or find and replace can convert any punctuation and apostrophies to their smart equivalent easily if that us simething the end user wants.

As for "correct" dictionaries, in this case it means the user has chosen a dictionary that is encoded in in a charset that actually supports the characters he or she uses in the language. The ISO-8859-1 charset does not actually have a smart single quote in it and that is the encoding the de_DE dictionary uses.

KevinH

Last edited by KevinH; 01-17-2017 at 09:48 PM.
KevinH is online now   Reply With Quote