View Single Post
Old 12-12-2025, 09:01 AM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,282
Karma: 6686152
Join Date: Nov 2009
Device: many
Is that Unicode “Middle Dot” (U+00b7) considered to be a member of the regular expression to match a word "\\w+" when the UnicodeProperty is set? That is how CodeView finds its word boundaries since the internal Qt functions fails to exclude all forms of quotes and does not follow unicode standards.

If it is not considered a unicode "word" character, it will be excluded as we now use QRegularExpression (\\w+), with UseUnicodeProperties set to extract the true unicode word out of the selected string of characters.

So in CodeView type a word with that middle dot in it, then use Sigil's find and replace set for regex search (make sure the unicode property flag is set) using that search expression and use find to determine if that unicode char is deemed to be a word character or not.

Update:

According to this cite: https://codepoints.net/U+00B7?lang=en
It is considered "inter-word" punctuation and its group is "Other Punctuation". It is not considered by this Unicode definition to be a character *inside* a word. (ie. inter not intra).

You may be using it in some other way but according to official unicode properties it is not considered part of a word.

Last edited by KevinH; 12-12-2025 at 10:53 AM.
KevinH is online now   Reply With Quote