MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Suggestion: Spellcheck Enhancement (Numbers) (https://www.mobileread.com/forums/showthread.php?t=292086)

KevinH 11-14-2017 03:15 PM

It would be interesting to see the QChar values of the smart right single quoted word when it reaches the spellcheck code on Windows. This must be either a Qt specific bug in Windows or an encoding issue at some point as it works on both Linux and Mac.

I will eye-ball the code to see if I can find a suspect.

KevinH 11-14-2017 03:33 PM

I am betting the problem is here:
Code:

QString Utility::getSpellingSafeText(const QString &raw_text)
{
    // There is currently a problem with Hunspell if we attempt to pass
    // words with smart apostrophes from the CodeView encoding.
    // There are likely better ways to solve this, but this one does
    // get the job done until someone can implement something better.
    QString text(raw_text);
    return text.replace(QString::fromUtf8("\u2019"), "'");
}

Windows source files probably use different encodings instead of utf-8 and the unicode constant is \u2019 is not being properly converted to a utf-8 string in this function.

u2019 in utf-8 is a 3 byte sequence: 0xE2 0x80 0x99 and so fromUtf8 routine should be passed that byte sequence or we load QChar with u2019 and then use toUtf8 to generate the input or better yet use the QChar directly.

DiapDealer 11-14-2017 03:43 PM

Let me know if there's anything you need me to try compiling and/or testing on Windows.

KevinH 11-14-2017 03:46 PM

So a better way to write this might be:

return text.replace(QChar(0x2019),QChar(0x27));

DiapDealer, when you get a free moment, would you try that change in Misc/Utility.cpp in getSpellingSafeText and see if it makes any difference?

Thanks

KevinH 11-14-2017 04:11 PM

Do you want me to push that change? It may not help, but certainly should not hurt.

DiapDealer 11-14-2017 04:49 PM

Quote:

Originally Posted by KevinH (Post 3611815)
Do you want me to push that change? It may not help, but certainly should not hurt.

Yes, please do! It certainly seems to do the trick in my testing so far.

It also fixes the similar problem of adding words with smart-apostrophes to a user word-list (only adding a straight apos char would work previously).

KevinH 11-14-2017 06:09 PM

Glad to hear it! I will push it later this evening once I am back at my developer box.

KevinH 11-14-2017 07:32 PM

Just pushed that fix to master.

KevinH 11-15-2017 01:50 PM

Also, I have just pushed support for spellchecking words with numbers as controlled by a Sigil preference setting. That small change actually forced changes in many files and a ui dialog.

Please note, if your particular dictionary does not have any words with digits in them in their wordlist, this feature will not be of much help.

This feature should appear in the next release unless I messed something up.

DiapDealer 11-15-2017 06:58 PM

Quote:

Originally Posted by KevinH (Post 3612255)
Also, I have just pushed support for spellchecking words with numbers as controlled by a Sigil preference setting. That small change actually forced changes in many files and a ui dialog.

Please note, if your particular dictionary does not have any words with digits in them in their wordlist, this feature will not be of much help.

This feature should appear in the next release unless I messed something up.

Seems to work as intended so far. :thumbsup:

The only thing in the above mentioned situations that isn't covered (that I've noticed) is:

Quote:

This is a B-17 Bomber.
No hyphenated words show up as misspelled that I can see. Whether they contain numbers or not isn't really irrelevant.

KevinH 11-15-2017 07:33 PM

Words that have an internal normal dash (hyphen) should be spell checked properly given how the code handles them. If not, something is funny.

DiapDealer 11-15-2017 08:16 PM

Quote:

Originally Posted by KevinH (Post 3612373)
Words that have an internal normal dash (hyphen) should be spell checked properly given how the code handles them. If not, something is funny.

My bad. You're right. Questionable words on either side of the hyphen will mark the hyphenated word as misspelled. I was just tripped up by the fact that B-17 doesn't show up as a misspelling. Neither does A-14 F-70 Z-29 or D-11, regardless of the new number preference setting. Shouldn't things like that be flagged as potential misspellings?

KevinH 11-15-2017 10:11 PM

The individual letters A, B, etc and the numbers after the hyphen are all valid standalone words so they are legal hyphenated. That said that Gbh-17 should show up as wrong since Gbh is not a valid word. This also depends of the wordchar list provided in the en_US.aff file (or whatever dictionary aff file you are using.

Tex2002ans 11-16-2017 01:58 AM

Quote:

Originally Posted by DiapDealer (Post 3612363)
Seems to work as intended so far. :thumbsup:

Fantastic. Can't wait for the next version.

Quote:

Originally Posted by DiapDealer (Post 3612363)
No hyphenated words show up as misspelled that I can see. Whether they contain numbers or not isn't really irrelevant.

Edit: Whoops, read this Diap's post wrong. Ignore what I posted below. :rofl:

This wasn't necessarily about showing up as misspelled, it was about showing up in the list at all.

For example:

Code:

The Letter B, B-17 Bomber, and Room B9.
Would show up in the Spellcheck List as 3 "B".

When in reality, there is only 1 "B" + 1 "B-17" + 1 "B9".

This becomes a serious issue when it happens to something common, like "A", or the Index/Footnote Example, where there can be hundreds of "A" + "n" + "ff" + "f" within the EPUB. It becomes impossible to use the Spellcheck List to locate/find and correct these.

Or in the case of "l92l". That shows up at 2 "l". Good luck searching through every lowercase 'l' in the book trying to find it!

Doitsu 11-16-2017 11:34 AM

Quote:

Originally Posted by KevinH (Post 3612255)
Also, I have just pushed support for spellchecking words with numbers as controlled by a Sigil preference setting. That small change actually forced changes in many files and a ui dialog.

Thanks!

Quote:

Originally Posted by Tex2002ans (Post 3612504)
Or in the case of "l92l". That shows up at 2 "l". Good luck searching through every lowercase 'l' in the book trying to find it!

In the latest pre-release version, "l92l" will be marked as misspelled, if the new Check Numbers option is enabled. This should make it easier to find numbers with letters in them and vice versa, because all words that contain numbers and letters will be flagged as misspelled, if the new Check Numbers option is enabled.


All times are GMT -4. The time now is 07:02 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.