Thread: Regex examples
View Single Post
Old 07-11-2019, 02:34 PM   #584
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,682
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
9 times out of ten, @PenguinCEO's answer will work as good as any--provided only english-standard ascii letters are being used. But if there's any special accented uppercase characters that would fit the bill (or special unicode space characters after the period) it won't work.

Some flavors of regex offer "character class subtraction" but the PCRE regex that Sigil uses does not. The regex module included with its bundled Python for plugins, however, does. I won't get into the syntax needed to accomplish character class subtraction for the various regex engines, but I will mention that negative lookaheads can be used to achieve the same thing in all flavors of regex.

So in this specific case, if one wanted to match any unicode uppercase letter (\p{Lu}) except the letter D ((?!D)) following a period (\.) and any unicode space character (\p{Zs}). You could use something like the following:

Code:
\.\p{Zs}(?!D)\p{Lu}
It would work for all the following cases (even if the spaces in question were special non-breaking space characters or narrow-non-breaking space characters)

Code:
. Ö

. G

. F

. Ê
Too much, I know, but I couldn't help myself.

Using @PenguinCEO's much simpler solution, the negative lookahead workaround for character class subtraction would look something like like:
Code:
\. (?!D)[A-Z]

Last edited by DiapDealer; 07-11-2019 at 05:49 PM.
DiapDealer is offline   Reply With Quote