![]() |
#1 |
Enthusiast
![]() Posts: 36
Karma: 10
Join Date: Jun 2011
Location: Lima, Peru
Device: Kindle 10Gen / Kobo Aura HD / Nook STR
|
Non-English issues
Hello.
I use Sigil on Windows 11, both in Spanish. Some time ago, I have noticed two behaviors that I hope can be fixed: 1. Text boundaries are inconsistent: for example, when double-clicking or using Ctrl+cursor, in cases like: "abc" (abc) -abc- it correctly selects from a to c (not the symbols). But it does not recognize symbols from other languages as boundaries: —With opening exclamation and question marks (necessary in Spanish) it also selects the first two. ¡abc! ¿abc? —With European/Latin quotation mark variants «abc» “abc” ‘abc’ and other symbols, it selects the word + both of them. …abc… —abc— •abc• I am leaving a barebones sample epub (examples taken from the “Quotation marks” page on Wikipedia), in case it is useful. (I have deliberately removed language tags from opf/xhtml.) 2. In the Preview window, clicking Inspect Page always displays this message: Spoiler:
But none of the three possibilities remain. The next session will display the 'Zod, Ursa & Non' buttons again. ![]() Could Sigil save the user's choice? Or, at least, force the message to be disabled? ![]() |
![]() |
![]() |
![]() |
#2 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
|
The ctrl-click in CodeView is not code controlled by Sigil. That code is built into Qt specifically the QtPlainTextEdit widget. What it includes or excludes should be controlled by your locale and what is determined by unicode to be punctuation? It is not under our direct control.
I will look to see if there is any workaround we could try. And the Chrome inspector code is not ours to control either, it is built in to Qt's QtWebEngine. We do already allow QtWebengine to save to local-storage as specified in our QWebEngineProfile. As long as your Sigil Preferences folder is located where you have full write permission, all of that should work. So I have no idea why it is not saving things there. Perhaps downloading and installing the latest chrome browser and loading a page and firing up its developer mode inspector may help. So both of these are really Qt bugs or changes. Perhaps you should file an official bug report with Qt so that these issues are addressed upstream? |
![]() |
![]() |
![]() |
#3 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,774
Karma: 206758686
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I can confirm that the Inspector language setting will not persist between Sigil sessions on Windows. The light/dark interface seems to though. But I don't know how much of that would be because of locales.
Last edited by DiapDealer; 09-09-2025 at 12:26 PM. |
![]() |
![]() |
![]() |
#4 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
|
Okay, I finally tracked the path back when a double-click happens when in a word in CodeView and that routine calls atWordSeparator in qtextengine.cpp which is simply the following:
Code:
bool QTextEngine::atWordSeparator(int position) const { const QChar c = layoutData->string.at(position); switch (c.unicode()) { case '.': case ',': case '?': case '!': case '@': case '#': case '$': case ':': case ';': case '-': case '<': case '>': case '[': case ']': case '(': case ')': case '{': case '}': case '=': case '/': case '+': case '%': case '&': case '^': case '*': case '\'': case '"': case '`': case '~': case '|': case '\\': return true; default: break; } return false; } So you really should file a bug in Qt and let them know that they need to really fix their QTextEngine class defintion of atWordBoundary to use unicode character classes. At least earlier in that routine they use QCharAttributes to determine if whitespace or not. They have the unicode tools to do that: see qunicodetools.cpp Code:
struct QCharAttributes { uchar graphemeBoundary : 1; uchar wordBreak : 1; uchar sentenceBoundary : 1; uchar lineBreak : 1; uchar whiteSpace : 1; uchar wordStart : 1; uchar wordEnd : 1; uchar mandatoryBreak : 1; }; Code:
case QTextCursor::EndOfWord: { QTextEngine *engine = layout->engine(); const QCharAttributes *attributes = engine->attributes(); const int len = blockIt.length() - 1; if (relativePos >= len) return false; if (engine->atWordSeparator(relativePos)) { ++relativePos; while (relativePos < len && engine->atWordSeparator(relativePos)) ++relativePos; } else { while (relativePos < len && !attributes[relativePos].whiteSpace && !engine->atWordSeparator(relativePos)) ++relativePos; } newPosition = blockIt.position() + relativePos; break; } ... case QTextCursor::StartOfWord: { if (relativePos == 0) break; // skip if already at word start QTextEngine *engine = layout->engine(); const QCharAttributes *attributes = engine->attributes(); if ((relativePos == blockIt.length() - 1) && (attributes[relativePos - 1].whiteSpace || engine->atWordSeparator(relativePos - 1))) return false; if (relativePos < blockIt.length()-1) ++position; Q_FALLTHROUGH(); } Last edited by KevinH; 09-09-2025 at 01:11 PM. |
![]() |
![]() |
![]() |
#5 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
|
Given the above, there are not a lot of things we can do to workaround this issue. Perhaps we could strip off a list of additional characters built from QCharAttributes from each end, but that would really need a full unicode implementation of some sort.
Perhaps an env var that a user can set to indicate what chars it does not want when auto selecting a word and leave it up to the user to set it properly. This is something to consider for a future release. Last edited by KevinH; 09-13-2025 at 01:13 PM. |
![]() |
![]() |
![]() |
#6 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
|
Quote:
That is where we told the inspectors QWebEngineView to put its local storage. |
|
![]() |
![]() |
![]() |
#7 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
|
Okay, I tried rewriting the WebProfileMgr to explicitly create a QWebEngineProfile for our Inspector, but the local-devtools folder never appears to get used at all. If I change settings (the gear) in Inspector nothing is ever written to local-devtools or any place else I could find.
So again this is a bug in QtWebEngine. But as far as I can tell, we can not impact this without moving to Qt 6.9.2 and using their QWebProfileBuilder class to properly set the cache path, and return to the disk storage mode. But that is something for a future release, as all of that requires we move to Qt 6.9.2 first and then heavily conditionalize the code to work back to Qt 6.4 |
![]() |
![]() |
![]() |
#8 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 234
Karma: 1000244
Join Date: Oct 2021
Location: Germany
Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)
|
It’s a shame that so many libraries and tools still hard-code just a few (too few!) values instead of relying on the well-defined Unicode properties. Let’s hope this gets better over time…
|
![]() |
![]() |
![]() |
#9 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
|
Okay, I have pushed a fix to master to return to using disk caches and persistent storage locations for the Inspector and Preview that use the new QWebEngineProfileBuilder.
Now at least my settings for Inspector are saved between launches which should fix part of the issues you were seeing. This fix will appear in the next release of Sigil. I will try to implement a strip character routine for double-clicking in CodeView that is a bit better at removing all punctuation and quotations from the end for a future release. Thank you for your bug report. Last edited by KevinH; 09-14-2025 at 04:47 PM. |
![]() |
![]() |
![]() |
#10 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
|
Quote:
See https://doc.qt.io/qt-6/qtextboundaryfinder.html The world is eating up cpu cycles to support languages that are just too complex for their own good! (As an aside can someone please explain why in German the word for a women's skirt is masculine! Babble is driving me crazy with "der Rock" ![]() The unicode spec in my opinion is a classic example of what happens when you get a worldwide committee to design a spec! I wish the computer world would standardize on one unicode support library (icu?) and make it available in all computer languages and string manipulation systems and build it into every OS. Until then we are stuck with half/partial implementations all over the place. Last edited by KevinH; 09-14-2025 at 05:32 PM. |
|
![]() |
![]() |
![]() |
#11 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,895
Karma: 30277270
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
There are some guidelines here ==>> https://germanwithlaura.com/noun-gender/ German, Austrian and Swiss rivers are feminine whereas the rest are masculine. Not exactly arbitrary, just plain bloody daft ![]() Last edited by BetterRed; 09-14-2025 at 09:26 PM. |
|
![]() |
![]() |
![]() |
#12 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() Posts: 284
Karma: 516
Join Date: Nov 2015
Location: Europe EEC
Device: Kindle Fire HD6 & HD8
|
Quote:
English is getting its share of gender problems too. There is a growing resistance to using singular possessive pronouns. This results in many sentences which clearly start 'singular' and then finish plural because the writer is unwilling to say 'his' or 'her' for fear of betraying the gender of the person referred to earlier. 'Their' has now become both a singular and plural possessive form because the only other alternative in English is to write 'his or her' (of course, the writer could always declassify the person by using 'its'). The same trend applies to modern usage of all pronouns in English. Possessives, in a language where nouns are gendered like French, betray the gender of the noun and not of the owner of the object. Languages (and people) are daft in many ways. |
|
![]() |
![]() |
![]() |
#13 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
|
Okay, I have been looking at how I can work around the double-click to select a word in CodeView not removing any quotes ex cept for the most basic sign and double quotes.
The double-click to select a word in CodeView also include smart singe and smart double quotes as part of the word as well as: ‹ › « » ‚ ‘ „ “. So this is not just an internationalization issue, it is an issue with all forms of smart quotes. I am hoping that that I can use \w+ in regular expressions for unicode and that I will be able to extract a better attempt at isolating a single word with it upon double-clicking. If so, I may be able to work around this one as well. Last edited by KevinH; 09-17-2025 at 01:47 PM. |
![]() |
![]() |
![]() |
#14 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
|
@Jugaor
Update ... My workaround seems to have worked nicely. Basically we let CodeView select a word by double-clicking, and then we use QRegularExpression ((\\w+), with UseUnicodeProperties set to extract the true unicode word out of the selected word. I tested it using the following in CV. Code:
<p> "word" 'word' ‘word’ “word” ‹word› «word» ‚word‘ „word“ </p> I have pushed this change to master. This fix will appear in the next release. With this change, and the change to make the Inspector remember its settings, both of the issues you reported should now hopefully be fixed in our current master and will be part of the next release. Thank you for your bug reports! |
![]() |
![]() |
![]() |
#15 |
Enthusiast
![]() Posts: 36
Karma: 10
Join Date: Jun 2011
Location: Lima, Peru
Device: Kindle 10Gen / Kobo Aura HD / Nook STR
|
KevinH, thanks so much for the follow-up!
1. Excellent! Your new approach is much more robust than the native or list-based one: it should yield consistent results. A non-binding question: will this also apply to Ctrl+cursor movement/selection? (not a vital issue.) 2. Excellent, too! BTW, I forgot to mention something that may or may not be useful when debugging QT 6.9. Looking for a solution, I tried adding the QTWEBENGINE_CHROMIUM_FLAGS env var with the value --lang=es (and regional variants, as --lang=es-PE) — from the system itself (in my case, Windows 11) it has no effect. — now, in the new env-vars.txt, it does seem to cause a change: the Inspector shows a blank window (?) I'm letting you know in case it helps. Thanks again! |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Informal English usage and possible translation issues | j.p.s | General Discussions | 91 | 01-30-2024 06:08 AM |
Font issues - Chinese, Japanese, English | Elfwreck | PocketBook | 2 | 06-30-2023 11:07 PM |
Having issues while converting non-English books | shraddhajadhav7 | Conversion | 2 | 01-12-2022 09:09 AM |
Problem: Calibre converts non-English titles to English equivalents. | Fritz_Katz | Conversion | 2 | 05-18-2021 07:06 PM |
PB302 - How to replace English->Russian dictionary with English only (with defin.)? | guyanonymous | PocketBook | 29 | 08-03-2010 06:05 PM |