![]() |
#16 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 15356729
Join Date: Dec 2010
Device: Kindle PW2
|
|
![]() |
![]() |
![]() |
#17 |
null operator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,989
Karma: 12781614
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
↑ ↑ ↑ ✔️
Several years ago I tried, and failed, to edit the Kracked Press en GB hunspell dictionary. I also tried and failed to create a domain specific dictionary. I was surprised there were no tools specific to the task - no demand I guess. Today, I could possibly create an epub from scratch with notepad and pkzip - but only because of what I've learnt from using Sigil ![]() BR Last edited by BetterRed; 07-09-2019 at 06:50 PM. |
![]() |
![]() |
![]() |
#18 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,940
Karma: 2514398
Join Date: Nov 2009
Device: many
|
Unfortunately MySpell 2 or 3 had both munch and unmunch tools that worked for the dictionaries used at that time (including en, german, french, spanish, etc) but Hunspell needed compound prefixes, compound suffixes, and compound words to handle Hungarian and other languages. The standard munch and unmunch tools were never really modified for those changes and nothing was ever documented.
MySpell dictionaries still work in Hunspell and work for most western languages. I can probably dig up a copy of MySpell-3 source someplace and walk anyone through it. Last edited by KevinH; 07-09-2019 at 11:56 PM. |
![]() |
![]() |
![]() |
#19 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
|
Quote:
Some time ago i created a spanish hunspell spanih dict, i needed to dig to create a good one, now it's used with sigil by a lot of people. Now the idea is to improve it. Also I want improve a real dict with definitions. It's hard to find documentation about dictionaries, or a good program to edit them and export to differente formats. |
|
![]() |
![]() |
![]() |
#20 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,940
Karma: 2514398
Join Date: Nov 2009
Device: many
|
I will grab a copy of the spanish hunspell dictionary and take a look to see what features are being used. If they stick to things that MySpell groks, we can use the MySpell tools to expand the spanish dictionary and then remunch it for use in hunspell. If it uses any of the newer Hunspell features, the older munch and unmunch tools will not be of any help.
KevinH Quote:
|
|
![]() |
![]() |
![]() |
#21 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,940
Karma: 2514398
Join Date: Nov 2009
Device: many
|
Okay, the version shipped inside Sigil on Windows and Mac of the spanish dictionary is a straight MySpell level dictionary and as such the munch and unmunch tools will work.
I found an old copy of MySpell-3 stored on a google code archive and was able to easily build and run it on my Mac. This included munch and unmunch tools as well. So with unmunch, I can take the es.aff (which describes prefixes and suffixes commonly used in Spanish along with the rules when they apply) and the es.dic files and create one long universal list of words recognized in all of its forms. You can then add lots of new words. Or even create a new Prefixes or Suffixes flag if you know which ones might be missing and the rules for applying them. Once we have that we can run munch to create the new .dic file. We can also add charmaps and replacement tables along with phonetic sound alike rules to help improve the suggestions generated. So if this is something you would like to do, I would be happy to help. Once you get into Hunspell only features, then munch and unmunch will no longer work and you are on your own so to speak. |
![]() |
![]() |
![]() |
#22 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,940
Karma: 2514398
Join Date: Nov 2009
Device: many
|
Just for laughs, I ran unmunch on the en_US.dic and en_US.aff file and the 62,074 base words with affix flags expanded to a word list of 152,469 unique words.
I tried the same thing for es.dic and es.aff and the 58,154 base words with affix flags expanded to a word list of 689,751 unique words. So Spanish must make use of prefixes and suffixes much more than English! Also, if you lookat the working set vocabulary used by Shakespeare for example, it was something like 35,000 words. Most average people have working sets of 10,000 to 20,000 words. Any way you look at it having 689751 unique words seems to be huge coverage. Has anyone validated the universe of words the Spanish dictionary already covers? |
![]() |
![]() |
![]() |
#23 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,940
Karma: 2514398
Join Date: Nov 2009
Device: many
|
@elchamaco
If I were to zip up the unmunched spanish wordlist and post it here would you be willing to download the wordlist and look at it to see if it at all makes sense. Having over 600,000 unique letter combinations that a spellcheck dictionary would deem correct for a wordlist just seems too big to be true without compound words. Thanks, KevinH |
![]() |
![]() |
![]() |
#24 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
|
The one i created was near 1 million words the base (980-990), 234k the muched list. I used the aff from libreoffice spanish if i remember well.
|
![]() |
![]() |
![]() |
#25 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,940
Karma: 2514398
Join Date: Nov 2009
Device: many
|
The problem is more words do not make a spellcheck dictionary necessarily better (unlike an online dictionary).
As I tried to explain earlier, a spellcheck dictionary is meant to cover the "working set" of a language. It is not meant to be exhaustive such as an online or paper copy dictionary would attempt to be. The reason is that many times common mistakes and typos turn out to be actual but very infrequently used "words" and not what the author intended. It also results in words being suggested for replacement that the author would never use. Both lower the effectiveness of the spellchecker. The idea is that more rarely used or more esoteric words can and should be looked up in online dictionaries. One of the nice features of spellcheck dictionaries is that authors can add their own list of more unique words that they actually use to augment the "working set" making the spellcheck function fine tuned that that particular person and their writing. That was and continues to be the concept behind the design of spell check dictionaries. Hope something here helps. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Bug] Spellcheck List Cursor Location | Tex2002ans | Sigil | 8 | 08-06-2018 11:53 AM |
Export words from Pocketbook | superpawko | PocketBook | 4 | 12-27-2017 05:06 PM |
Spellcheck Ignore Words | tetrault | Sigil | 4 | 02-11-2017 04:25 PM |
Spellcheck in book view + selected text spellcheck | unfairrobot | Sigil | 2 | 12-19-2016 05:50 PM |
Unable to use spellcheck dictionary for italicizing words | sjhawar | Sigil | 18 | 10-20-2016 04:01 PM |