MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Spellcheck dialog shows well-spelled words as misspelled (https://www.mobileread.com/forums/showthread.php?t=337751)

ebray187 03-02-2021 09:22 PM

Spellcheck dialog shows well-spelled words as misspelled
 
In code view, word underlining works fine by adding it to misspelled words only. However, the spell check dialog also shows well-written words.
I do not know if I am configuring or using something wrong or if it is an problem with Sigil or my current epub. I had understood that adding the xml:lang="X" was enough for it to work properly.

http://imgur.com/GliKnM0.png

In the screenshot "informadas" is well-written. Is a test with only 2 words, but i'm working on a large book with a looooot of false positives, so the Spellchecker dialogue is almost unusable for me.

I stay tuned to provide more information if necessary.

Regards.

-----------------------------------
Sigil 1.4.3
QT 5.15.2
Gentoo Linux

ebray187 03-02-2021 09:34 PM

Also i have set the <dc:language>es</dc:language> into the content.opf file

KevinH 03-02-2021 09:38 PM

In Sigil's Preferences, what has you set for the default language, and what for the Primary and Secondary Dictionaries?

Do you have a proper hunspell spanish dictionary installed?

ebray187 03-02-2021 09:52 PM

User Interface Language: English
Default Language For Metadata: Spanish

On the primary language dictionary i have a custom spanish dictionary (aff and dic files inside the hunspell_dictionaries folder). I used this dictionary without problems since 0.6, 0.9.4, 1.2.1 and 1.3.0

Nothing on the second language.

ebray187 03-02-2021 10:01 PM

Here is a zip with the epub and the dictionary

KevinH 03-02-2021 10:06 PM

Does your custom dictionary follow the hunspell language naming convention:

es_ES.dic
es_ES.aff

Exactly how are the .aff and .dic files you are using are named? The spellcheck dialog does not use the Primary Dictionary per se. It instead depends on the normal naming convention to map language codes into dictionary names automatically.

My guess is your custom dictionary is not being mapped to es because of its differs from the naming convention.

KevinH 03-02-2021 10:12 PM

Yep, how on earth is the SpellCheck Dialog which can now handle any number of languages and dictionaries supposed to know your .dic and .aff are for spanish given no language code at the start of the file names.

Rename them to match the expected convention - or use two symlinks using new names that match convention if you do not want to rename and restart Sigil and all should work.

ebray187 03-02-2021 10:25 PM

Got it. Thanks!!

From a user point of view I think it would be useful to be able to specify which dictionary to use for X language by following the convention of the xml:lang (for "en", use this; for "es" this). In the current state it can be a bit misleading that the primary language dictionary setting is not what applies to the spellcheck language.

Anyway thanks a lot for your help and for all the work making this amazing tool.:thanks:

KevinH 03-02-2021 11:27 PM

On Windows and macOS, we supply the Hunspell dictionaries along with Sigil. So they of course follow the start with language code naming convention.

Even MySpell, the predecessor to HunSpell, and ispell the predecessor to MySpell, all follow that naming convention of starting with a language code.

So there is a long long history on unix/linux for naming dictionaries so no one has to guess what language they are for especially as these dictionaries are shared across many apps.

That is why there really is no need to have to associate language codes with dictionaries. It would just make them harder to use and share with repeated mappings needed for multiple apps.

Sorry but Sigil will not be changing how we handle this. :)

Glad to hear you got it working!

Tex2002ans 03-03-2021 05:27 AM

Quote:

Originally Posted by ebray187 (Post 4098907)
Also i have set the <dc:language>es</dc:language> into the content.opf file

:thumbsup:

An EPUB's language goes like a pyramid:
  • book-level = content.opf
  • chapter-level = <html>
  • word-level = <span>

1. Setting the content.opf correctly is the most important! And you did that great. :thumbsup:

This says "this book is in Spanish!"

2. (Optional) Add lang + xml:lang to your <html>:

Chapter01.xhtml (Before):

Code:

<html xmlns="http://www.w3.org/1999/xhtml">
Chapter01.xhtml (After):

Code:

<html xmlns="http://www.w3.org/1999/xhtml" lang="es" xml:lang="es">
This says "this chapter is Spanish!"

3. (Super duper optional) Mark "foreign words" with their language:

Code:

<p>¿Puedo ir al bathroom, por favor?</p>
Code:

<p>¿Puedo ir al <span lang="en" xml:lang="en">bathroom</span>, por favor?</p>
This says "the entire book/chapter/sentence is in Spanish, but the word 'bathroom' is English!"

Quote:

Originally Posted by ebray187 (Post 4098902)
I had understood that adding the xml:lang="X" was enough for it to work properly.

Any time you're marking which language, it's good practice to use BOTH lang + xml:lang.

Basic idea is lang = HTML + xml:lang = XML.

I explained a little more in this post a few days ago:

"Search and Replace" (Post #11)

And one common error that occurs is someone having the book + chapters be mismatched.

(So a Spanish book "es", but you accidentally set English "en" in an HTML chapter. This is pretty easy to spot in the Language column in Tools > Spellcheck > Spellcheck.)

ebray187 03-03-2021 10:46 AM

Thank you both for taking the time to respond.

I fully understand what you are saying and it certainly makes a lot of sense. The only thing I can add in this regard is that at least I did not find the information in the documentation (changelog or manual) that as of version 1.4 dictionaries must follow the Hunspell convention in their names to work correctly with Sigil (despite how well its explained in this post as a fundamental practice).

I understand that in this same forum you are working on an update of the manual, so I take the opportunity to mention that it would be very helpful to include the information that you and others have kindly shared in this and other posts about it.

The question that remains for me (and that I think was lost in the translation of my last message) is how a user can define which dictionary to use for each language. Perhaps it is not so common in English, but for example between Argentine Spanish and Spain Spanish there are many spelling differences. Before 1.4 I simply changed the dictionary in preferences, today I don't know how to do it from Sigil without altering the epub or constantly changing the file names of my dictionaries.

Greetings and again, thank you very much for the time and patience.

ebray187 03-03-2021 10:54 AM

Quote:

Originally Posted by ebray187 (Post 4099039)
Before 1.4 I simply changed the dictionary in preferences, today I don't know how to do it from Sigil without altering the epub or constantly changing the file names of my dictionaries.

I mean taking into account that the Primary Language Dictionary does not affect the spellchecker but only the underlining of the code view.

KevinH 03-03-2021 11:01 AM

Regional language differences just mean you use a languagecode that includes regions. There are many many variations of English but each has different region codes (as do the dictionaries).

en_US vs en_GB vs en_CA are different dictionaries targeted for US, Great Britain, and Canada respectively. The dictionaries names all start that way.

The dc:language code allows the region to be used: en-US instead of just en

The xml:lang and lang attributes also allow a region to be specified: "en-GB" instead of just "en".

The language pulldown supports a large set of regional language codes. Here is a code snippet for just Spanish:

Code:

        "es"    << tr("Spanish") <<
        "es-AR" << tr("Spanish") + QString(" - ") + tr("Argentina") <<
        "es-BO" << tr("Spanish") + QString(" - ") + tr("Bolivia") <<
        "es-CL" << tr("Spanish") + QString(" - ") + tr("Chile") <<
        "es-CO" << tr("Spanish") + QString(" - ") + tr("Columbia") <<
        "es-CR" << tr("Spanish") + QString(" - ") + tr("Costa Rica") <<
        "es-DO" << tr("Spanish") + QString(" - ") + tr("Dominican Republic") <<
        "es-EC" << tr("Spanish") + QString(" - ") + tr("Ecuador") <<
        "es-SV" << tr("Spanish") + QString(" - ") + tr("El Salvador") <<
        "es-GT" << tr("Spanish") + QString(" - ") + tr("Guatemala") <<
        "es-HN" << tr("Spanish") + QString(" - ") + tr("Honduras") <<
        "es-MX" << tr("Spanish") + QString(" - ") + tr("Mexico") <<
        "es-NI" << tr("Spanish") + QString(" - ") + tr("Nicaragua") <<
        "es-PA" << tr("Spanish") + QString(" - ") + tr("Panama") <<
        "es-PY" << tr("Spanish") + QString(" - ") + tr("Paraguay") <<
        "es-PE" << tr("Spanish") + QString(" - ") + tr("Peru") <<
        "es-PR" << tr("Spanish") + QString(" - ") + tr("Puerto Rico") <<
        "es-ES" << tr("Spanish") + QString(" - ") + tr("Spain") <<
        "es-UY" << tr("Spanish") + QString(" - ") + tr("Uruguay") <<
        "es-VE" << tr("Spanish") + QString(" - ") + tr("Venezuela") <<

As for adding this to the user manual, the number of users who do not use the normal hunspell dictionaries is very very small. If people do run into difficulties they can of course come here to our User Forum on Mobileread to get help.

That said, if you would like to help edit the user-guide with additional information, we would be happy to consider it for inclusion.

Hope this helps!

KevinH 03-03-2021 11:11 AM

The choice of Primary Language dictionary does impact the SpellCheck dialog and real time spell checking (red squiggley) in general.

If you chose en_GB as your Primary dictionary, then everywhere you use just "en" as a language code or xml:lang attribute will map to that dictionary over the en_US one. But it has to be able to determine the language of a dictionary from the long standing dictionary naming convention.

In Code View the red sqiggley is determined completely by Primary and Secondary dictionaries chosen and it ignores any lang or xml:lang attributes as using lang and xml:lang is not an epub2 requirement if dc:language is set.

This is useful when only one language is used as specified in dc:language and no where else or only a few words are taken from a second language.

To support true multilanguage spell checking the use of xml:lang or lang attributes is required and what is recommended for epub3 / html5.

ebray187 03-03-2021 11:20 AM

You explained it as an open book. Thanks! :thumbsup:


All times are GMT -4. The time now is 09:24 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.