Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 09-09-2025, 04:16 AM   #1
jugaor
Enthusiast
jugaor began at the beginning.
 
jugaor's Avatar
 
Posts: 36
Karma: 10
Join Date: Jun 2011
Location: Lima, Peru
Device: Kindle 10Gen / Kobo Aura HD / Nook STR
Non-English issues

Hello.
I use Sigil on Windows 11, both in Spanish.
Some time ago, I have noticed two behaviors that I hope can be fixed:

1. Text boundaries are inconsistent: for example, when double-clicking or using Ctrl+cursor, in cases like:
"abc" (abc) -abc-
it correctly selects from a to c (not the symbols).

But it does not recognize symbols from other languages as boundaries:
—With opening exclamation and question marks (necessary in Spanish) it also selects the first two.
¡abc! ¿abc?
—With European/Latin quotation mark variants
«abc» “abc” ‘abc’
and other symbols, it selects the word + both of them.
…abc… —abc— •abc•

I am leaving a barebones sample epub (examples taken from the “Quotation marks” page on Wikipedia), in case it is useful.
(I have deliberately removed language tags from opf/xhtml.)


2. In the Preview window, clicking Inspect Page always displays this message:
Spoiler:

But none of the three possibilities remain. The next session will display the 'Zod, Ursa & Non' buttons again.
Could Sigil save the user's choice? Or, at least, force the message to be disabled?

Attached Thumbnails
Click image for larger version

Name:	DevTools.jpg
Views:	274
Size:	56.9 KB
ID:	217997  
Attached Files
File Type: epub quotation marks and text boundaries [comillas y delimitadores de texto].epub (2.2 KB, 75 views)
jugaor is offline   Reply With Quote
Old 09-09-2025, 08:25 AM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
The ctrl-click in CodeView is not code controlled by Sigil. That code is built into Qt specifically the QtPlainTextEdit widget. What it includes or excludes should be controlled by your locale and what is determined by unicode to be punctuation? It is not under our direct control.

I will look to see if there is any workaround we could try.

And the Chrome inspector code is not ours to control either, it is built in to Qt's QtWebEngine. We do already allow QtWebengine to save to local-storage as specified in our QWebEngineProfile. As long as your Sigil Preferences folder is located where you have full write permission, all of that should work. So I have no idea why it is not saving things there. Perhaps downloading and installing the latest chrome browser and loading a page and firing up its developer mode inspector may help.

So both of these are really Qt bugs or changes. Perhaps you should file an official bug report with Qt so that these issues are addressed upstream?
KevinH is offline   Reply With Quote
Old 09-09-2025, 12:21 PM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,774
Karma: 206758686
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I can confirm that the Inspector language setting will not persist between Sigil sessions on Windows. The light/dark interface seems to though. But I don't know how much of that would be because of locales.

Last edited by DiapDealer; 09-09-2025 at 12:26 PM.
DiapDealer is offline   Reply With Quote
Old 09-09-2025, 12:52 PM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
Okay, I finally tracked the path back when a double-click happens when in a word in CodeView and that routine calls atWordSeparator in qtextengine.cpp which is simply the following:

Code:
bool QTextEngine::atWordSeparator(int position) const
{
    const QChar c = layoutData->string.at(position);
    switch (c.unicode()) {
    case '.':
    case ',':
    case '?':
    case '!':
    case '@':
    case '#':
    case '$':
    case ':':
    case ';':
    case '-':
    case '<':
    case '>':
    case '[':
    case ']':
    case '(':
    case ')':
    case '{':
    case '}':
    case '=':
    case '/':
    case '+':
    case '%':
    case '&':
    case '^':
    case '*':
    case '\'':
    case '"':
    case '`':
    case '~':
    case '|':
    case '\\':
        return true;
    default:
        break;
    }
    return false;
}
So no locale info is used, no unicode character classes are used, nothing for international support. This is simply horrible by Qt. They should be ashamed of that piece of code.

So you really should file a bug in Qt and let them know that they need to really fix their QTextEngine class defintion of atWordBoundary to use unicode character classes.

At least earlier in that routine they use QCharAttributes to determine if whitespace or not.

They have the unicode tools to do that: see qunicodetools.cpp

Code:
struct QCharAttributes
{
    uchar graphemeBoundary : 1;
    uchar wordBreak        : 1;
    uchar sentenceBoundary : 1;
    uchar lineBreak        : 1;
    uchar whiteSpace       : 1;
    uchar wordStart        : 1;
    uchar wordEnd          : 1;
    uchar mandatoryBreak   : 1;
};
And the code that triggers all of this is in QTextCursor select() function that uses these two routine snippets:

Code:
    case QTextCursor::EndOfWord: {
        QTextEngine *engine = layout->engine();
        const QCharAttributes *attributes = engine->attributes();
        const int len = blockIt.length() - 1;
        if (relativePos >= len)
            return false;
        if (engine->atWordSeparator(relativePos)) {
            ++relativePos;
            while (relativePos < len && engine->atWordSeparator(relativePos))
                ++relativePos;
        } else {
            while (relativePos < len && !attributes[relativePos].whiteSpace && !engine->atWordSeparator(relativePos))
                ++relativePos;
        }
        newPosition = blockIt.position() + relativePos;
        break;
    }

...

case QTextCursor::StartOfWord: {
        if (relativePos == 0)
            break;

        // skip if already at word start
        QTextEngine *engine = layout->engine();
        const QCharAttributes *attributes = engine->attributes();
        if ((relativePos == blockIt.length() - 1)
            && (attributes[relativePos - 1].whiteSpace || engine->atWordSeparator(relativePos - 1)))
            return false;

        if (relativePos < blockIt.length()-1)
            ++position;

        Q_FALLTHROUGH();
    }

Last edited by KevinH; 09-09-2025 at 01:11 PM.
KevinH is offline   Reply With Quote
Old 09-09-2025, 01:14 PM   #5
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
Given the above, there are not a lot of things we can do to workaround this issue. Perhaps we could strip off a list of additional characters built from QCharAttributes from each end, but that would really need a full unicode implementation of some sort.

Perhaps an env var that a user can set to indicate what chars it does not want when auto selecting a word and leave it up to the user to set it properly.

This is something to consider for a future release.

Last edited by KevinH; 09-13-2025 at 01:13 PM.
KevinH is offline   Reply With Quote
Old 09-09-2025, 01:16 PM   #6
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by DiapDealer View Post
I can confirm that the Inspector language setting will not persist between Sigil sessions on Windows. The light/dark interface seems to though. But I don't know how much of that would be because of locales.
Did it create anything in your Sigil Preferences folder in a "local-devtools" folder?

That is where we told the inspectors QWebEngineView to put its local storage.
KevinH is offline   Reply With Quote
Old 09-09-2025, 02:33 PM   #7
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
Okay, I tried rewriting the WebProfileMgr to explicitly create a QWebEngineProfile for our Inspector, but the local-devtools folder never appears to get used at all. If I change settings (the gear) in Inspector nothing is ever written to local-devtools or any place else I could find.

So again this is a bug in QtWebEngine. But as far as I can tell, we can not impact this without moving to Qt 6.9.2 and using their QWebProfileBuilder class to properly set the cache path, and return to the disk storage mode.

But that is something for a future release, as all of that requires we move to Qt 6.9.2 first and then heavily conditionalize the code to work back to Qt 6.4
KevinH is offline   Reply With Quote
Old 09-09-2025, 06:04 PM   #8
Moonbase59
Addict
Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.Moonbase59 ought to be getting tired of karma fortunes by now.
 
Moonbase59's Avatar
 
Posts: 234
Karma: 1000244
Join Date: Oct 2021
Location: Germany
Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)
It’s a shame that so many libraries and tools still hard-code just a few (too few!) values instead of relying on the well-defined Unicode properties. Let’s hope this gets better over time…
Moonbase59 is offline   Reply With Quote
Old 09-14-2025, 03:07 PM   #9
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
Okay, I have pushed a fix to master to return to using disk caches and persistent storage locations for the Inspector and Preview that use the new QWebEngineProfileBuilder.

Now at least my settings for Inspector are saved between launches which should fix part of the issues you were seeing.

This fix will appear in the next release of Sigil.

I will try to implement a strip character routine for double-clicking in CodeView that is a bit better at removing all punctuation and quotations from the end for a future release.

Thank you for your bug report.

Last edited by KevinH; 09-14-2025 at 04:47 PM.
KevinH is offline   Reply With Quote
Old 09-14-2025, 05:22 PM   #10
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by Moonbase59 View Post
It’s a shame that so many libraries and tools still hard-code just a few (too few!) values instead of relying on the well-defined Unicode properties. Let’s hope this gets better over time…
In general I agree. But unfortunately the unicode spec for determining a word boundary is a huge two dimensional table that still has lots of special cases. Thus a whole unicode library is needed for something that is quite straightforward and quite fast for most languages (is it a space, or quote, or punctuation).

See https://doc.qt.io/qt-6/qtextboundaryfinder.html

The world is eating up cpu cycles to support languages that are just too complex for their own good!

(As an aside can someone please explain why in German the word for a women's skirt is masculine! Babble is driving me crazy with "der Rock"

The unicode spec in my opinion is a classic example of what happens when you get a worldwide committee to design a spec!

I wish the computer world would standardize on one unicode support library (icu?) and make it available in all computer languages and string manipulation systems and build it into every OS. Until then we are stuck with half/partial implementations all over the place.

Last edited by KevinH; 09-14-2025 at 05:32 PM.
KevinH is offline   Reply With Quote
Old 09-14-2025, 08:59 PM   #11
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,895
Karma: 30277270
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by KevinH View Post
. . .

(As an aside can someone please explain why in German the word for a women's skirt is masculine! Babble is driving me crazy with "der Rock"
Noun genders in German are more or less arbitrary. In the case of items of clothing it has nothing to do with the sex of the persons who would wear them. Büstenhalter is masculine. Different nouns for the same thing can have different genders.

There are some guidelines here ==>> https://germanwithlaura.com/noun-gender/

German, Austrian and Swiss rivers are feminine whereas the rest are masculine. Not exactly arbitrary, just plain bloody daft

Last edited by BetterRed; 09-14-2025 at 09:26 PM.
BetterRed is offline   Reply With Quote
Old 09-15-2025, 03:27 AM   #12
philja
Addict
philja will become famous soon enoughphilja will become famous soon enoughphilja will become famous soon enoughphilja will become famous soon enoughphilja will become famous soon enoughphilja will become famous soon enough
 
Posts: 284
Karma: 516
Join Date: Nov 2015
Location: Europe EEC
Device: Kindle Fire HD6 & HD8
Quote:
Originally Posted by BetterRed View Post
Noun genders in German are more or less arbitrary. In the case of items of clothing it has nothing to do with the sex of the persons who would wear them. Büstenhalter is masculine. Different nouns for the same thing can have different genders.

There are some guidelines here ==>> https://germanwithlaura.com/noun-gender/

German, Austrian and Swiss rivers are feminine whereas the rest are masculine. Not exactly arbitrary, just plain bloody daft
And would you believe that all French vaginas are masculine?

English is getting its share of gender problems too. There is a growing resistance to using singular possessive pronouns. This results in many sentences which clearly start 'singular' and then finish plural because the writer is unwilling to say 'his' or 'her' for fear of betraying the gender of the person referred to earlier.

'Their' has now become both a singular and plural possessive form because the only other alternative in English is to write 'his or her' (of course, the writer could always declassify the person by using 'its'). The same trend applies to modern usage of all pronouns in English.

Possessives, in a language where nouns are gendered like French, betray the gender of the noun and not of the owner of the object.

Languages (and people) are daft in many ways.
philja is offline   Reply With Quote
Old 09-17-2025, 01:42 PM   #13
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
Okay, I have been looking at how I can work around the double-click to select a word in CodeView not removing any quotes ex cept for the most basic sign and double quotes.

The double-click to select a word in CodeView also include smart singe and smart double quotes as part of the word as well as: ‹ › « » ‚ ‘ „ “.

So this is not just an internationalization issue, it is an issue with all forms of smart quotes.

I am hoping that that I can use \w+ in regular expressions for unicode and that I will be able to extract a better attempt at isolating a single word with it upon double-clicking.

If so, I may be able to work around this one as well.

Last edited by KevinH; 09-17-2025 at 01:47 PM.
KevinH is offline   Reply With Quote
Old 09-17-2025, 02:41 PM   #14
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,964
Karma: 6361444
Join Date: Nov 2009
Device: many
@Jugaor

Update ...

My workaround seems to have worked nicely. Basically we let CodeView select a word by double-clicking, and then we use QRegularExpression ((\\w+), with UseUnicodeProperties set to extract the true unicode word out of the selected word.

I tested it using the following in CV.
Code:
  <p>  "word" 'word' ‘word’ “word” ‹word› «word» ‚word‘ „word“ </p>
And by double clicking in the word on each one only the word itself was selected and not the various forms of quotations that are not part of a word.

I have pushed this change to master. This fix will appear in the next release.

With this change, and the change to make the Inspector remember its settings, both of the issues you reported should now hopefully be fixed in our current master and will be part of the next release.

Thank you for your bug reports!
KevinH is offline   Reply With Quote
Old Yesterday, 12:24 AM   #15
jugaor
Enthusiast
jugaor began at the beginning.
 
jugaor's Avatar
 
Posts: 36
Karma: 10
Join Date: Jun 2011
Location: Lima, Peru
Device: Kindle 10Gen / Kobo Aura HD / Nook STR
KevinH, thanks so much for the follow-up!

1. Excellent!
Your new approach is much more robust than the native or list-based one: it should yield consistent results.
A non-binding question: will this also apply to Ctrl+cursor movement/selection? (not a vital issue.)

2. Excellent, too!
BTW, I forgot to mention something that may or may not be useful when debugging QT 6.9.
Looking for a solution, I tried adding the QTWEBENGINE_CHROMIUM_FLAGS env var with the value --lang=es (and regional variants, as --lang=es-PE)
— from the system itself (in my case, Windows 11) it has no effect.
— now, in the new env-vars.txt, it does seem to cause a change: the Inspector shows a blank window (?)
I'm letting you know in case it helps.

Thanks again!
jugaor is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Informal English usage and possible translation issues j.p.s General Discussions 91 01-30-2024 06:08 AM
Font issues - Chinese, Japanese, English Elfwreck PocketBook 2 06-30-2023 11:07 PM
Having issues while converting non-English books shraddhajadhav7 Conversion 2 01-12-2022 09:09 AM
Problem: Calibre converts non-English titles to English equivalents. Fritz_Katz Conversion 2 05-18-2021 07:06 PM
PB302 - How to replace English->Russian dictionary with English only (with defin.)? guyanonymous PocketBook 29 08-03-2010 06:05 PM


All times are GMT -4. The time now is 02:34 AM.


MobileRead.com is a privately owned, operated and funded community.