![]() |
#16 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
|
It should be rare on the same node of the DOM tree, since a new span will create a new node. So as long as spans (and other tags) use the proper lang attributes it should really not happen. Without the proper lang attributes added, Sigil will NOT be guessing language. That is a kettle of fish that Sigil will not be opening.
|
![]() |
![]() |
![]() |
#17 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,707
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Not so bad if the markuperer italicises the foreign phrases, at least then you can search and eyeball - provided they do it consistently, which of course they often don't.
Last edited by BetterRed; 06-30-2020 at 08:01 AM. |
![]() |
![]() |
![]() |
#18 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,548
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Absolutely agree. The expectation will be that the proper language attributes have already been correctly added. If this is not the case, users will need to add/correct them themselves (or take it up with an ebook's creator) if they want multi-language spellcheck to be the most useful in Sigil. GIGO still very much applies.
|
![]() |
![]() |
![]() |
#19 | ||
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 73
Karma: 7130
Join Date: Apr 2015
Device: PRS-T3
|
Quote:
Quote:
My PRST-T3 seems to work at least on a html-file-level, i.e. it can change the language when a new html-file is processed. So far, I haven't figured out, which language instructions it processes and ignores (e.g., xml:lang="..." vs. lang="..." or en-US vs. en_US). For instance, the PRS-T3 seems to ignore en_US/en_GB/de_DE/fr_..., while en-US/en-GB/de-DE/fr-... seems to work. |
||
![]() |
![]() |
![]() |
#20 | ||||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Even many browsers don't handle hyphenation properly yet, which is why I was interested if you found a reader that could do it at that level. Quote:
Quote:
See "Tags for Identifying Languages" (BCP47) and w3c's page on "Language tags in HTML and XML". Also, in XHTML xml:lang takes priority: Quote:
Quote:
See w3c's "Choosing a Language Tag": Quote:
Also, if you desperately need to handle multiple dictionaries in a single document, and you use Microsoft Word... you could import your properly-lang-marked EPUB -> DOCX using Toxaris's EPUB Tools: https://www.mobileread.com/forums/sh....php?p=2516490 I was pleasantly surprised to see it transferred over all lang information into DOCX, which made dealing with the red squigglies so much easier! (I recently used it to mark all Spanish/French/German text, and even American/British, making the spellchecking passes so much faster.) |
||||||
![]() |
![]() |
![]() |
#21 | ||
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
Quote:
![]() By every spellchecking Sigil created language map of the text: every letter position in text had a language assigned to it. By new text input, when the spellchecking was triggered, Sigil checked where the input was relative to the current language map and "guessed" the input language from it. Of course if the input was "<span lang=..." ![]() But, in most of the cases, it should work. The time penalty for this check was negligible in comparison to other multi language necessities. Quote:
Why this version has not found its way into Sigil? Probably it has not met Kevin's expectations - but but he was nice enough never to elaborate on it ![]() Last edited by varlog; 07-23-2020 at 04:18 PM. |
||
![]() |
![]() |
![]() |
#22 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Actually, I liked pieces of it but felt it actually was a bit too invasive. I felt a simpler solution might be possible. The main problem is adding a language attribute to any tag in CV would force the recreation of the entire text position language map and make it hard for the syntax highlighter to do its job on a line by line basis.
None of this is a problem for spellchecking static code via the spellcheck dialog when proper lang attributes have already been added. I like BetterRed's idea of only doing on the fly spell checking in the primary language and reserve multi-language spellchecking to the spellcheck dialog as it would be much less invasive and the language of the text of any tag can be determined by the parsed dom by checking parents up to the html tag for the most recent lang attribute. Last edited by KevinH; 07-23-2020 at 04:50 PM. |
![]() |
![]() |
![]() |
#23 | |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
Quote:
And there are probably some dragons hidden there... For one: it could be very annoying when the word changes its correctness as you work? Last edited by varlog; 07-23-2020 at 05:14 PM. |
|
![]() |
![]() |
![]() |
#24 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
|
It would do that also when any lang attribute is added. Words previously deemed correct would now be incorrect. What is the difference?
Until the actual language of the code is finally determined by either lang attributes on tags or inheritance, the word can not be properly spellchecked when multiple languages exist. Cutting and pasting text would also make"correctness" change which is why I do not like on the fly spellchecking in multiple languages. Last edited by KevinH; 07-23-2020 at 05:20 PM. |
![]() |
![]() |
![]() |
#25 | |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
Quote:
Actually it is users preferences question. The technical issues mean nothing to them. My personal opinion is: it would be confusing and annoying for average user. But I'm not the average user ![]() |
|
![]() |
![]() |
![]() |
#26 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
|
I think multi-language spell checking for more than a few words in a second language is pretty much a low probability use case for most users. So supporting either checking in all languages or check only the main language when checking on the fly is just fine as writing a book in Sigil is rare, but fixing up and specifying needed language attributes is more common. So having secondary spellchecking happen in just the spellchecking dialog seems like the right approach for an epub editor to me.
That is what I tried to explain and ask for back when you started this if you check the messages. ![]() Take care, Kevin |
![]() |
![]() |
![]() |
#27 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 73
Karma: 7130
Join Date: Apr 2015
Device: PRS-T3
|
It would be good, if the on-the-fly checking would work on a file (html/xhtml) level by taking the local language definition in the file or the global language attribute if the local is not present.
|
![]() |
![]() |
![]() |
#28 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Yes it could quickly grep and check the html tag and body tag looking for an lang attribute to determine the primary language to do on the fly checking in for a specific xhtml file.
But as I explained, it can not easily do full on the fly red squiggly checking in multiple languages at the same time, when editing the file is being done. When editing a file, the CV code may not even be parseable at some point. And language attributes need not be on the same line nor even close to the words being checked. They may be inherited from a div with a lang attribute that may be many many nodes above the current location (its great great grandparent). And a simple cut and paste may effectively change the language of a word. So the red squiggly that is checking spelling on the fly would have to work only with fully parseable xhtml and would have to create and parse the DOM tree after every change. That is a big overhead I am not willing to inflict on Sigil for so little real use or payback. I hope people do know they can add known foreign words to a custom dictionary which works just fine when the number of second or third language words is smaller. The only other approaches are checking in all used languages in that file, or just the primary used language when checking on the fly. BetterRed likes the latter but I am a fan of the former. That said.. since spell checking using the spellcheck dialog uses static pages and it can parse the complete page and properly extract all of the language attributes properly. So unless someone can figure a way not to have to reparse a file after every single change and somehow properly determine the language of a piece of text with just local information, I do not yet see a workable solution for on the fly checking that I would be willing to integrate into Sigil. Last edited by KevinH; 07-24-2020 at 11:34 AM. |
![]() |
![]() |
![]() |
#29 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Based on some offline discussions with Doitsu and others, the simplest path forward is to allow the user to set a single secondary language dictionary in user preferences.
The number of use cases for needing more than 2 different languages for spell checking is exceeding small, so this looks like the right approach to me. The secondary language dictionary can be set to None if so desired. Then only in the SpellCheck Dialog, the user would have the options to filter the spellchecked words by language code detected. Words not marked for either of the two languages, would be ignored unless told otherwise. The on-the-fly spell checking (red squiggley) would also be controllable by preferences to either check in primary language only, or both primary and secondary languages. This approach should be much simpler, have much lower overhead, and much less intrusive. Any Feedback on this type of approach is welcome as this is the next new feature for Sigil I will be working on. |
![]() |
![]() |
![]() |
#30 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
![]() So, to summarize: 1. Real-time spellchecking would be upgraded to handle 2 languages. 2. Spellcheck Lists will be upgraded to handle all lang? (I imagine similar to Calibre's with a new Language column added?) Quote:
I assume it'll be like Calibre's "Show only misspelled words" checkbox?
If so, again, I think fantastic step forward. ![]() |
||
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Search in 2 dictionary in a same time | akorx | Kobo Reader | 3 | 03-06-2020 09:30 AM |
Bug in dictionary function, wish: upgrade to using multiple dictionaries at one time | Bjarne | Calibre | 1 | 04-21-2019 05:13 AM |
So I tried to use the dictionary on my PB360 for the first time... | maxbookworm | PocketBook | 18 | 06-27-2010 08:29 PM |
Dictionary lookup time | tompe | Bookeen | 17 | 11-08-2008 12:19 PM |