Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 06-30-2020, 07:28 AM   #16
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
It should be rare on the same node of the DOM tree, since a new span will create a new node. So as long as spans (and other tags) use the proper lang attributes it should really not happen. Without the proper lang attributes added, Sigil will NOT be guessing language. That is a kettle of fish that Sigil will not be opening.
KevinH is offline   Reply With Quote
Old 06-30-2020, 07:58 AM   #17
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,707
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by mcdummy View Post

I've read many ebooks that are written in a first language, but contain terms or citations from other languages. If the creator of the ebook does not mark all instances of a word in the proper language, you end up with this situation.
Not so bad if the markuperer italicises the foreign phrases, at least then you can search and eyeball - provided they do it consistently, which of course they often don't.

Last edited by BetterRed; 06-30-2020 at 08:01 AM.
BetterRed is online now   Reply With Quote
Old 06-30-2020, 08:23 AM   #18
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,548
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by KevinH View Post
Without the proper lang attributes added, Sigil will NOT be guessing language. That is a kettle of fish that Sigil will not be opening.
Absolutely agree. The expectation will be that the proper language attributes have already been correctly added. If this is not the case, users will need to add/correct them themselves (or take it up with an ebook's creator) if they want multi-language spellcheck to be the most useful in Sigil. GIGO still very much applies.
DiapDealer is offline   Reply With Quote
Old 07-03-2020, 03:15 AM   #19
mcdummy
Connoisseur
mcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the rough
 
mcdummy's Avatar
 
Posts: 73
Karma: 7130
Join Date: Apr 2015
Device: PRS-T3
Quote:
Originally Posted by Tex2002ans View Post
And which ereader are you using that applies proper hyphenation?
I'm using a PRS-T3, which does not apply hyphenation to all languages.

Quote:
Originally Posted by Tex2002ans View Post
Does it work at the per-word level too? Or only works on a per-book's-language level?
I'm trying to figure this out.

My PRST-T3 seems to work at least on a html-file-level, i.e. it can change the language when a new html-file is processed.

So far, I haven't figured out, which language instructions it processes and ignores (e.g., xml:lang="..." vs. lang="..." or en-US vs. en_US).

For instance, the PRS-T3 seems to ignore en_US/en_GB/de_DE/fr_..., while en-US/en-GB/de-DE/fr-... seems to work.
mcdummy is offline   Reply With Quote
Old 07-03-2020, 04:55 AM   #20
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by mcdummy View Post
I'm using a PRS-T3, which does not apply hyphenation to all languages.
Thanks for the info. I'm very interested in multi-language hyphenation.

Even many browsers don't handle hyphenation properly yet, which is why I was interested if you found a reader that could do it at that level.

Quote:
Originally Posted by mcdummy View Post
My PRST-T3 seems to work at least on a html-file-level, i.e. it can change the language when a new html-file is processed.
Probably a good assumption.

Quote:
Originally Posted by mcdummy View Post
So far, I haven't figured out, which language instructions it processes and ignores (e.g., xml:lang="..." vs. lang="..." or en-US vs. en_US).
Using _ is invalid. Only - allowed.

See "Tags for Identifying Languages" (BCP47) and w3c's page on "Language tags in HTML and XML".

Also, in XHTML xml:lang takes priority:

Quote:
The xml:lang attribute is not actually useful for handling the file as HTML, but takes over from the lang attribute any time you process or serve the document as XML. The lang attribute is allowed by the syntax of XHTML, and may also be recognized by browsers. When using other XML parsers, however (such as the lang() function in XSLT) you can't rely on the lang attribute being recognized.
Quote:
Originally Posted by mcdummy View Post
For instance, the PRS-T3 seems to ignore en_US/en_GB/de_DE/fr_..., while en-US/en-GB/de-DE/fr-... seems to work.
Also, best to stick with minimal possible. Better to more broadly specify (en) than over-specify wrongly (en-US on a en-GB document) or redundantly.

See w3c's "Choosing a Language Tag":

Quote:
Always bear in mind that the golden rule is to keep your language tag as short as possible. Only add further subtags to your language tag if they are needed to distinguish the language from something else in the context where your content is used.
* * *

Also, if you desperately need to handle multiple dictionaries in a single document, and you use Microsoft Word... you could import your properly-lang-marked EPUB -> DOCX using Toxaris's EPUB Tools:

https://www.mobileread.com/forums/sh....php?p=2516490

I was pleasantly surprised to see it transferred over all lang information into DOCX, which made dealing with the red squigglies so much easier!

(I recently used it to mark all Spanish/French/German text, and even American/British, making the spellchecking passes so much faster.)
Tex2002ans is offline   Reply With Quote
Old 07-23-2020, 04:07 PM   #21
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
Quote:
Originally Posted by KevinH View Post
...
The issue is showing on the fly spelling mistakes (with red squiggly underlines) as you are editing the code itself in CodeView. The existence of potentially incomplete or broken code and the need to walk the tree back up the parent path to determine the language on the fly makes things hard to determine what language to check the just completed word in (and do it quickly).
...
Actually, my idea to solve this problem was somehow working - AFAIremeber . Not perfect, but the only feasible under circumstances, I think.
By every spellchecking Sigil created language map of the text: every letter position in text had a language assigned to it. By new text input, when the spellchecking was triggered, Sigil checked where the input was relative to the current language map and "guessed" the input language from it.
Of course if the input was "<span lang=..." ...
But, in most of the cases, it should work. The time penalty for this check was negligible in comparison to other multi language necessities.

Quote:
Originally Posted by Doitsu View Post
If I understand his development thread correctly, varlog didn't achieve his goal:
That needs clarification: I did create somehow working multi language spellchecking version of Sigil - but it was not my actual goal.

Why this version has not found its way into Sigil?
Probably it has not met Kevin's expectations - but but he was nice enough never to elaborate on it .

Last edited by varlog; 07-23-2020 at 04:18 PM.
varlog is offline   Reply With Quote
Old 07-23-2020, 04:48 PM   #22
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
Actually, I liked pieces of it but felt it actually was a bit too invasive. I felt a simpler solution might be possible. The main problem is adding a language attribute to any tag in CV would force the recreation of the entire text position language map and make it hard for the syntax highlighter to do its job on a line by line basis.

None of this is a problem for spellchecking static code via the spellcheck dialog when proper lang attributes have already been added.

I like BetterRed's idea of only doing on the fly spell checking in the primary language and reserve multi-language spellchecking to the spellcheck dialog as it would be much less invasive and the language of the text of any tag can be determined by the parsed dom by checking parents up to the html tag for the most recent lang attribute.

Last edited by KevinH; 07-23-2020 at 04:50 PM.
KevinH is offline   Reply With Quote
Old 07-23-2020, 05:07 PM   #23
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
Quote:
Originally Posted by KevinH View Post
...
I like BetterRed's idea of only doing on the fly spell checking in the primary language and reserve multi-language spellchecking to the spellcheck dialog as it would be much less invasive and the language of the text of any tag can be determined by the parsed dom by checking parents up to the html tag for the most recent lang attribute.
Just my personal usage opinion: I expect to be corrected on the fly.
And there are probably some dragons hidden there...
For one: it could be very annoying when the word changes its correctness as you work?

Last edited by varlog; 07-23-2020 at 05:14 PM.
varlog is offline   Reply With Quote
Old 07-23-2020, 05:17 PM   #24
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
It would do that also when any lang attribute is added. Words previously deemed correct would now be incorrect. What is the difference?

Until the actual language of the code is finally determined by either lang attributes on tags or inheritance, the word can not be properly spellchecked when multiple languages exist.

Cutting and pasting text would also make"correctness" change which is why I do not like on the fly spellchecking in multiple languages.

Last edited by KevinH; 07-23-2020 at 05:20 PM.
KevinH is offline   Reply With Quote
Old 07-23-2020, 05:45 PM   #25
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
Quote:
What is the difference?
Not sure... As soon as the tag is closed the world is perfect again? in contrast to: world is perfect only when you explicit use the spellchecker?

Actually it is users preferences question. The technical issues mean nothing to them.

My personal opinion is: it would be confusing and annoying for average user. But I'm not the average user .
varlog is offline   Reply With Quote
Old 07-23-2020, 08:58 PM   #26
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
I think multi-language spell checking for more than a few words in a second language is pretty much a low probability use case for most users. So supporting either checking in all languages or check only the main language when checking on the fly is just fine as writing a book in Sigil is rare, but fixing up and specifying needed language attributes is more common. So having secondary spellchecking happen in just the spellchecking dialog seems like the right approach for an epub editor to me.

That is what I tried to explain and ask for back when you started this if you check the messages.

Take care,

Kevin
KevinH is offline   Reply With Quote
Old 07-24-2020, 10:11 AM   #27
mcdummy
Connoisseur
mcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the rough
 
mcdummy's Avatar
 
Posts: 73
Karma: 7130
Join Date: Apr 2015
Device: PRS-T3
Quote:
Originally Posted by KevinH View Post
So supporting either checking in all languages or check only the main language when checking on the fly is just fine
It would be good, if the on-the-fly checking would work on a file (html/xhtml) level by taking the local language definition in the file or the global language attribute if the local is not present.
mcdummy is offline   Reply With Quote
Old 07-24-2020, 11:30 AM   #28
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
Yes it could quickly grep and check the html tag and body tag looking for an lang attribute to determine the primary language to do on the fly checking in for a specific xhtml file.

But as I explained, it can not easily do full on the fly red squiggly checking in multiple languages at the same time, when editing the file is being done.

When editing a file, the CV code may not even be parseable at some point. And language attributes need not be on the same line nor even close to the words being checked. They may be inherited from a div with a lang attribute that may be many many nodes above the current location (its great great grandparent). And a simple cut and paste may effectively change the language of a word.

So the red squiggly that is checking spelling on the fly would have to work only with fully parseable xhtml and would have to create and parse the DOM tree after every change.
That is a big overhead I am not willing to inflict on Sigil for so little real use or payback.
I hope people do know they can add known foreign words to a custom dictionary which works just fine when the number of second or third language words is smaller.

The only other approaches are checking in all used languages in that file, or just the primary used language when checking on the fly.

BetterRed likes the latter but I am a fan of the former.


That said.. since spell checking using the spellcheck dialog uses static pages and it can parse the complete page and properly extract all of the language attributes properly.

So unless someone can figure a way not to have to reparse a file after every single change and somehow properly determine the language of a piece of text with just local information, I do not yet see a workable solution for on the fly checking that I would be willing to integrate into Sigil.

Last edited by KevinH; 07-24-2020 at 11:34 AM.
KevinH is offline   Reply With Quote
Old 07-31-2020, 10:47 AM   #29
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,735
Karma: 5703586
Join Date: Nov 2009
Device: many
Based on some offline discussions with Doitsu and others, the simplest path forward is to allow the user to set a single secondary language dictionary in user preferences.

The number of use cases for needing more than 2 different languages for spell checking is exceeding small, so this looks like the right approach to me. The secondary language dictionary can be set to None if so desired.

Then only in the SpellCheck Dialog, the user would have the options to filter the spellchecked words by language code detected.

Words not marked for either of the two languages, would be ignored unless told otherwise.

The on-the-fly spell checking (red squiggley) would also be controllable by preferences to either check in primary language only, or both primary and secondary languages.

This approach should be much simpler, have much lower overhead, and much less intrusive.

Any Feedback on this type of approach is welcome as this is the next new feature for Sigil I will be working on.
KevinH is offline   Reply With Quote
Old 08-02-2020, 06:56 AM   #30
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by KevinH View Post
[...] the simplest path forward is to allow the user to set a single secondary language dictionary in user preferences.

[...]

The on-the-fly spell checking (red squiggley) would also be controllable by preferences to either check in primary language only, or both primary and secondary languages.
This sounds like a fantastic step forward.

So, to summarize:

1. Real-time spellchecking would be upgraded to handle 2 languages.

2. Spellcheck Lists will be upgraded to handle all lang? (I imagine similar to Calibre's with a new Language column added?)

Quote:
Originally Posted by KevinH View Post
Then only in the SpellCheck Dialog, the user would have the options to filter the spellchecked words by language code detected.

Words not marked for either of the two languages, would be ignored unless told otherwise.
Can you clarify on "ignored unless told otherwise"?

I assume it'll be like Calibre's "Show only misspelled words" checkbox?
  • Off
    • Displays Word/Count/Language/Misspelled columns.
    • Shows all words within book (similar to current Sigil).
  • On
    • Removes "Misspelled" column.
    • Only wrongly spelled + foreign words are shown.

If so, again, I think fantastic step forward.
Tex2002ans is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search in 2 dictionary in a same time akorx Kobo Reader 3 03-06-2020 09:30 AM
Bug in dictionary function, wish: upgrade to using multiple dictionaries at one time Bjarne Calibre 1 04-21-2019 05:13 AM
So I tried to use the dictionary on my PB360 for the first time... maxbookworm PocketBook 18 06-27-2010 08:29 PM
Dictionary lookup time tompe Bookeen 17 11-08-2008 12:19 PM


All times are GMT -4. The time now is 01:18 AM.


MobileRead.com is a privately owned, operated and funded community.