08-13-2020, 05:32 PM | #46 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Why mess with near-perfection? The Language column is simple, understandable, human-readable, and most importantly, works the exact same way as the Spellcheck List works now. It's a straight enhancement. Trying to overload the Words column with "lang-goop: word" makes the actual purpose of the column—displaying the word—more obtuse, and puts the lang front-and-center. Calibre duplicated Sigil's Spellcheck Lists + added enhancements (like Multi-Language spellchecking)... So can Sigil copy back from Calibre + make enhancements (like being able to search within a language). * * * Anyway, since I'm one of the more bigger users/proponents of multi-language spellchecking, I'm making myself available for a video chat if you want to discuss this stuff. Probably easier to iron out a lot of use-cases + potential usability issues and pitfalls over video/audio. And I'd love to help make Sigil the best it can be. I'll send you a PM later today. |
|
08-13-2020, 06:23 PM | #47 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
I do not consider a 2 or 4 character prefix offset from the word as "lang-goop". And how does it take any focus away from the word itself, anymore than an entire separate column would?
Nor would I be overloading the word column as language determines the meaning and use of the word. You also did not address my comment that if a user feels that a language code is too technical, they would be unable to set or edit the lang attributes needed to properly support things. So assuming people understand that de is German, es is Spanish, and en is English is not beyond most users of Sigil. So I'll wait to hear from other interested parties first before making any decisions. If there really is only a small handful of users for this feature, then perhaps it would be better as a plugin (edit plugin with a gui) than part of Sigil itself. The nice thing about this approach is it can all just go away/be hidden when only a single language is used. Last edited by KevinH; 08-13-2020 at 08:05 PM. |
Advert | |
|
08-13-2020, 07:05 PM | #48 | |
null operator (he/him)
Posts: 20,570
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
I do the language markup in the original manuscript, mainly that's Word.
When I am using the spell checker I want to focus on the content not on the markup. I go back to my original suggestion Quote:
BTW Sigil's spell checker already has a unique feature that I value. Calibre checks spelling in DC elements, such as description and subjects, in the .opf file, and for some reason it also marks override_css as an error in the following: Code:
<style type="text/css" title="override_css"> @page {padding: 0pt; margin:0pt} body { text-align: center; padding:0pt; margin: 0pt; } </style> BR Last edited by BetterRed; 08-13-2020 at 07:11 PM. Reason: ALSO highlighted |
|
08-13-2020, 08:14 PM | #49 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
We are only discussing the SpellcheckEditor Dialog here. Not real-time (red squiggley) syntax highlighting which you gave a preference for earlier. Preview will not be highlighting misspelled words. Preview is just that ... Preview. Spellchecking is the domain of CodeView where the editing can be done. Either that or use PageEdit for that purpose.
|
08-14-2020, 03:35 AM | #50 | |
Groupie
Posts: 171
Karma: 40000
Join Date: Oct 2013
Device: kindle
|
Quote:
All the best |
|
Advert | |
|
08-14-2020, 04:48 AM | #51 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
But here's another overview: Red Squigglies in Code View KevinH already implemented 2 dictionary support, so now instead of only handling a single language, you can specify 2: English + Spanish That would take care of most misspellings (red squigglies). (Vast majority of books only have 1/2 languages, 3+ is much more rare, especially with proper lang markup [which is extremely rare already].) Spellcheck List Enhancement Making (Tools > Spellcheck > Spellcheck) better in order to be able to handle language. So let's say there's: Code:
<p lang="en">I like tacos.</p> <p lang="es">Me gustan los tacos.</p> Code:
Word | Count ______|______ tacos | 2 What this would do is then more accurately report words, and use the dictionaries in order to check if words are spelled correctly. (A correct word in Spanish may be completely wrong in English.) Quote:
Getting amnesia? It was only a few weeks ago! |
||
08-14-2020, 04:57 PM | #52 | |
null operator (he/him)
Posts: 20,570
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
08-15-2020, 11:53 AM | #53 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Here are some other questions related to spellchecking:
1. Should UserDictionaries be specific to a language or not? If so, should the UserDictionary name then have to start with a language/region code just like hunspell dictionaries do? How else would the language of the UserDictionary be determined? 2. Should using Ignore on a word be treated universally so it is ignored in all used languages? I assume this is what users would think when hitting ignore on a word but ... wanted to check. |
08-15-2020, 12:19 PM | #54 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
BTW, I took a look at how Kovid/Calibre handles real-time red squiggleys for multiple languages and he needed to create his own SyntaxHighlighter (not use QSyntaxHighlighter), that basically constantly runs in its own thread and does the equivalent of a full QuickParser run. It the uses QTextDocument BlockUserData to store parsed state and his own User Defined QTextCharFormat property to store the locale of every character in the file (and therefore lang). These formats are used for local spell check syntax highlighting and since they are associated with each character of the file, as editing is done, the locale info follows the editing just like other character formatting like bold or italic.
Needless to say, I really do not think this way for Sigil to go at all. I would rather allow the user to specify a Primary Language Dictionary (as now) and add a Secondary Language Dictionary controllable in Preferences. If no secondary language dictionary is set by the user, it is just like real time red sqiggley spellchecking now. If a secondary language dictionary is set by the user, the word is checked in both to determine the state of red squiggleys now. Using the SpellCheckEditor will always properly handle spellchecking by lang or xml:lang attribute in as many languages that are used that have associated dictionaries installed. This way in Sigil, real-time red sqiggley spellcheck highlighting will serve a slightly different functionality (as determined by the user preferences) for up to 2 languages for use cases with no or improper use of lang or xml:lang attributes. While the full SpellCheckEditor will handle the use cases of proper use of language tags. |
08-15-2020, 12:35 PM | #55 | ||
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
IMHO, yes.
Quote:
This also the default naming convention of the Chromium dictionary website. For example: Code:
en_US.aff
en_US.dic
en_US.dic_delta
Quote:
Code:
Add ignored words to: The primary user dictionary ALL user dictionaries |
||
08-15-2020, 07:56 PM | #56 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Are these .dic_delta files pure text lists of words one per line like our UserDictionaries or do they have a specific format?
What if the existing UserDictionary file name does not include the language and region? For example, how would your naming convention work with the "default" user dictionary Sigil sets normally? I guess we could assume that the default dictionary to always be the same language and locale as the Primary Dictionary setting, whatever the user sets for that. The same question holds for all people's existing UserDictionaries. I guess we could again assume all existing UserDictionaries are in the Primary Dictionary language and region and copy and rename them in the next release to the form (replacing the en_US with whatever the primary dictionary is): en_US_previousname.dic_delta Then all future releases could use that approach. Or maybe better yet, we could provide an official Sigil plugin to convert all existing UserDictionaries to the new naming format? I wonder how many users of Sigil make extensive use of its UserDictionaries feature? I do not. If we decide to use language specific ignore lists, we can follow along in the same fashion. All feedback welcome. Last edited by KevinH; 08-15-2020 at 08:00 PM. |
08-15-2020, 11:16 PM | #57 | |||||
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Quote:
IMHO, this would also have the advantage that Sigil would automatically select the correct user dictionary for each language. Quote:
Quote:
Quote:
Hopefully, we'll hear from them. Otherwise they'll have to live with the fact that they can only create one user dictionary per language. Besides, if they really need more user dictionaries, they could simply create copies of the existing dictionaries. For example, users who created three en_us user dictionaries for would need to create two copies of the en_us dictionary: Code:
en_US1.aff en_US1.dic en_US1.dic_delta en_US2.aff en_US2.dic en_US2.dic_delta |
|||||
08-16-2020, 09:54 AM | #58 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Actually, for Sigil, using multiple UserDictionaries is possible since for each word in the UserDictionary is simply added to the open Hunspell dictionary using hunspell add. These are just temporary so many lists could be added.
Unfortunately, we do the same with Ignore which should always be kept in a separate hash or set and ignored words should never be added to the dictionary itself p, even just temporarily. I will look into changing that. |
08-16-2020, 07:10 PM | #59 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Interestingly, Ignored words in Sigil in macOS is a bit broken (although no one has ever complained). Ignored words should be associated with a single book/MainWindow. But Sigil on macOS can have multiple MainWindows open within the same instance of Sigil.
So the list of ignored words should be associated with either a MainWindow or a Book object and not the single instance of the SpellCheck object as is done currently, at least on macOS. That will take some work to keep and pass in an ignored word hash to be used by spellchecking from the book invoking the spellcheck. This is the second reason that ignored words should never be added to a dictionary itself even if temporarily. More work to do here ... |
08-16-2020, 07:18 PM | #60 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Given the above, I think the Ignore list must be language independent. Using Ignore on a a set of characters (potential word that is not recognized by a dictionary) will ignore that same set of characters in any language for spellcheck purposes. Each book should not have to keep and pass around separate temporary ignore lists just to spell a single word in more than one language.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Search in 2 dictionary in a same time | akorx | Kobo Reader | 3 | 03-06-2020 09:30 AM |
Bug in dictionary function, wish: upgrade to using multiple dictionaries at one time | Bjarne | Calibre | 1 | 04-21-2019 05:13 AM |
So I tried to use the dictionary on my PB360 for the first time... | maxbookworm | PocketBook | 18 | 06-27-2010 08:29 PM |
Dictionary lookup time | tompe | Bookeen | 17 | 11-08-2008 12:19 PM |