Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-26-2022, 04:10 AM   #61
Ashjuk
Fanatic
Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.
 
Ashjuk's Avatar
 
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
Your suggestion of removing the '-' from WORDCHARS sounds like the best way forward to me, Kevin.

Adding, and maintaining, another 50k words to the word list sounds like it is just creating unnecessary work if the result can achieved by other means.

You saying the old dictionary worked this way answers something that was puzzling me. I was sure I had come across self-defence previously and it not being flagged as misspelled.
Ashjuk is offline   Reply With Quote
Old 01-26-2022, 08:32 AM   #62
Ashjuk
Fanatic
Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.
 
Ashjuk's Avatar
 
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
I am still a little puzzled by how the suggested replacement words works.

Today I right clicked 'Theater' expecting the first word in the replacement list to be Theatre, but not so. The suggested replacements for Theater are:
Heater
Cheater
T heater
Th eater
The ater
The-ater
Heather
Thatcher

Theatre does not even make it on the list. Yet if I right-click center the first word it offers is centre.

Why is that?

Last edited by Ashjuk; 01-26-2022 at 08:36 AM.
Ashjuk is offline   Reply With Quote
Advert
Old 01-26-2022, 10:43 AM   #63
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,643
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Ashjuk,

No need to remove it as I used the scowl dictionary to spellcheck the 47000 long list and found under 100 that were not properly already covered. That is a list I can mange.

So we should be good to go.


Quote:
Originally Posted by Ashjuk View Post
Your suggestion of removing the '-' from WORDCHARS sounds like the best way forward to me, Kevin.

Adding, and maintaining, another 50k words to the word list sounds like it is just creating unnecessary work if the result can achieved by other means.

You saying the old dictionary worked this way answers something that was puzzling me. I was sure I had come across self-defence previously and it not being flagged as misspelled.
KevinH is offline   Reply With Quote
Old 01-26-2022, 11:07 AM   #64
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,643
Karma: 5433388
Join Date: Nov 2009
Device: many
No spellcheck dictionary can tell what the original author meant. It is not a grammar checker and it does not know parts of speech, and nor can it see the words that surround it.

So they take the misspelled word and look for correct words that are only 1 edit distance away, then they try swapping adjacent chars, then they try inserting a new character at every position (including a space), then they run through the replacement table provided in the .aff file, then if phonetic changes are enabled in the aff, they will try those, and finally if still no good words found they will use ngrams to make a suggestion.


So the word Theater which is not spelled correctly under en-GB is modified to try to look for "close" words that the original author could have meant.

In this case we get the following list:

Heater
Cheater
T heater
Th eater
Th-eater
Thea ter
Thea-ter
Theatre
Treater
Heather

which are all only 1 character edits, swaps, or insertions.

"Thea" is being generated as a proper name for someone and "ter" is a known abbreviation for "Total Expense Ratio", etc. Having things like "ter" and "th" be considered "words" is generally not a good idea but scowl obviously included them at some point to make things like 105th work most likely.

Here is where those pieces come from in scowl:

english-upper.50:Th
english-abbreviations.70:ter
english-proper-names.50:Thea

The spellchecker has no way to know what you meant by Theater, and based on small changes - it could be any of these valid combinations.

When suggesting, the case is changed to match that of the misspelled words case which makes Thea (a woman's proper name) quite likely as it would need no case change.

If you try the lowercase version "theater" you will get a much smaller list of suggestions as its case rules out proper first names.

Hope this explains things a bit better.

This is a great illustration of why adding proper first names to the spellchecker and bunches of abbreviations is not the best idea.

You might want to try the size 60 en-GB dictionaries to see if you like those better as they should have fewer proper names and abbreviations without periods in them which should prevent them from being considered as valid suggestions.

And it a misspelled word begins with an uppercase letter, be prepared for proper first names to be part of the suggestions.

People complain when a first name is marked as not correctly spelled and they put pressure on the spellchecker to include it, but they really make no sense. One of the reasons is that some programs that use spellcheckers do not allow user word lists to be kept, edited, and used which in turn leads to main dictionary bloat.

Hope this helps.


Quote:
Originally Posted by Ashjuk View Post
I am still a little puzzled by how the suggested replacement words works.

Today I right clicked 'Theater' expecting the first word in the replacement list to be Theatre, but not so. The suggested replacements for Theater are:
Heater
Cheater
T heater
Th eater
The ater
The-ater
Heather
Thatcher

Theatre does not even make it on the list. Yet if I right-click center the first word it offers is centre.

Why is that?

Last edited by KevinH; 01-26-2022 at 12:10 PM.
KevinH is offline   Reply With Quote
Old 01-26-2022, 12:10 PM   #65
Ashjuk
Fanatic
Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.
 
Ashjuk's Avatar
 
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
Thanks for the explanation, Kevin.

It's not an issue as I can easily correct it with a simple edit, but I was a little puzzled that what worked for center/centre seemed not to for Theater.

As you say, if it had been theater instead of Theater then the first suggestion is theatre. I will remember in future to look out for capitalised words.

Good news about the hyphenated words - hopefully we now have a definitive dictionary.
Ashjuk is offline   Reply With Quote
Advert
Old 01-26-2022, 12:22 PM   #66
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,643
Karma: 5433388
Join Date: Nov 2009
Device: many
But I want to fix scowl's handling of abbreviations first. They stick in lots of abbreviations without the proper use of "." which put things like "ter" as a word, which is absurd.

The scowl authors do not seem to understand the meaning or use of the WORDCHARS in the .aff file as they do not include - or . which makes no sense at all.

I will endeavour to fix that before any final release.
KevinH is offline   Reply With Quote
Old 01-26-2022, 01:19 PM   #67
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,643
Karma: 5433388
Join Date: Nov 2009
Device: many
Unfortunately, scowl includes over 700 abbreviations (even after ignoring the ones in all caps which are acronyms). None of them have an ending period that in any way would indicate that the word is abbreviated.

I have attached the list.

The problem is people have started dropping the "." from the most common abbreviations like cm, mm, ft, in, Mrs, Mr, Dr, PhD, etc which just confuses things even more.

I have attached the list of over 700 abbreviations that should probably either have an ending period added or be removed as they hide common spelling errors and end up polluting suggestions with nonsense.

Feedback welcome as to how best to treat these "words".
Attached Files
File Type: txt abbreviations.txt (3.2 KB, 65 views)
KevinH is offline   Reply With Quote
Old 01-27-2022, 04:21 AM   #68
Ashjuk
Fanatic
Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.
 
Ashjuk's Avatar
 
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
I have had a quick look at the list and agree with you, there is a lot that should have a period at the end.

I will try to find some time today to go through it and pull out the ones I think definitely should be included as a proper abbreviation.

Sadly, this is a sign of the times. I now see people writing texts and comments online where, there is no punctuation used at all, and often sentences starting with lower case letters.
Ashjuk is offline   Reply With Quote
Old 01-27-2022, 08:54 AM   #69
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,643
Karma: 5433388
Join Date: Nov 2009
Device: many
That would help. Thanks.
KevinH is offline   Reply With Quote
Old 01-28-2022, 10:16 AM   #70
Ashjuk
Fanatic
Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.
 
Ashjuk's Avatar
 
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
I have now worked my way through the abbreviations file and broken it down into four categories.

1. Items that MS Word accepted as is (probably acronyms etc.).

2. What I think, with the addition of a period, are valid abbreviations and are possible candidates for inclusion.

3. What appear to be valid abbreviations, but are probably rarely encountered.

4. Unknown items.
Attached Files
File Type: txt As_is.txt (930 Bytes, 50 views)
File Type: txt With_period.txt (895 Bytes, 57 views)
File Type: txt Uncommon.txt (1.2 KB, 58 views)
File Type: txt Unknown.txt (1.2 KB, 55 views)
Ashjuk is offline   Reply With Quote
Old 01-28-2022, 10:59 AM   #71
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,643
Karma: 5433388
Join Date: Nov 2009
Device: many
Thanks so much! I will incorporate all if this into the new dictionary.
KevinH is offline   Reply With Quote
Old 01-28-2022, 12:07 PM   #72
Ashjuk
Fanatic
Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.
 
Ashjuk's Avatar
 
Posts: 500
Karma: 3498633
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
Quote:
Originally Posted by KevinH View Post
Thanks so much! I will incorporate all if this into the new dictionary.
You are welcome.

Perhaps someone else could review the files to see if they agree (or not) with my conclusions.
Ashjuk is offline   Reply With Quote
Old 01-28-2022, 01:08 PM   #73
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,643
Karma: 5433388
Join Date: Nov 2009
Device: many
Anything is better than ignoring the ending period of abbreviations and making these word fragments be indistinguishable from normal words. It seems scowl default to strip the periods, and strip accents, and include way too many first names is going to lead to exactly the wrong behaviour, hidden typos and poor suggestions. For example, scowl considers "Th" to be a proper name. Any two letter proper name is going to lead to hiding spelling mistakes for words like "To", and "Th" is considered size 50. And "Th" is not the only one.

scowl has it good points but it also has some horrible points. I am thinking of removing all but the 100 top first names and all names less than 3 characters in length for this reason as well.

User word lists are much better places for those things than a spellchecker dictionary.

Last edited by KevinH; 01-28-2022 at 02:00 PM.
KevinH is offline   Reply With Quote
Old 01-28-2022, 03:16 PM   #74
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,643
Karma: 5433388
Join Date: Nov 2009
Device: many
Okay, took your abbreviations and the 12-dicts abbreviations and merged them. Then tweaked Sigil word parser for spellchecking to pass along ending periods as hunspell itself is smart enough to check for a valid abbreviation first and then strip it off and recheck in case just end of a sentence with a period.

Seems to help prevent errors hidden by bad abbreviations which also greatly helps improve suggestions as well.

All of this will be part of the next release of Sigil which will be a beta release because of the large number of internal changes and new or completely redesigned features.
KevinH is offline   Reply With Quote
Old 01-28-2022, 03:19 PM   #75
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,643
Karma: 5433388
Join Date: Nov 2009
Device: many
Sigil could include a single hunspell french dictionary for Windows and MacOS users since it is LGPL'd.

Is the classic one the one we should include?


Quote:
Originally Posted by roger64 View Post
Hi

Provide a dictionary for French speaking users ?

As far as I can remember, I've been using the Grammalecte Hunspell French dictionary both with Sigil and the Calibre editor.

Grammalecte tools are open source and have been perfected over the years -and still are- by an extended community of enthusiast users. Count me among them.

Its dictionary has been extensively tested. We are currently at version 7.

Its grammar checking tool has been already made available for Sigil users thanks to a Doitsu plugin which is automatically updated at each new version of the tool.

As you can see on the screenshot below, following the recommendation of his author, I use by default the "classic" version of this dictionary but keeps loading the other ones, if need be.

Not every French speaking user is aware of it. I think it would be useful if Sigil could also recommend or even better select this Grammalecte dictionary by default. You'll find the precise page to download the latest version here:

https://grammalecte.net/download.php?prj=fr

Click on the green star "Dictionnaires Hunspell 7.0
KevinH is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sigil newbie dictionary questions michaelbr Sigil 8 12-06-2020 09:41 AM
Content Dictionary update availability ntamas Amazon Kindle 7 10-05-2019 01:03 PM
Dictionary plugin in Sigil? For example Oxford-English Dictionary. Rindr Plugins 2 03-04-2018 11:11 AM
PRS-600 Dictionary not working after firmware update pakiyabhai Sony Reader 1 10-24-2009 09:02 PM
Update Problem and Dictionary Question barryp Sony Reader 8 09-22-2008 05:56 AM


All times are GMT -4. The time now is 01:35 PM.


MobileRead.com is a privately owned, operated and funded community.