![]() |
#1 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,373
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Regex search details
I see that unicode characters can be found using the \uFFFF format (was \x{FFFF} coming from Sigil's PCRE), and at least some of the unicode categories work (I can find any letter in any language using \p{L} or \p{Letter}), but many of those unicode categories don't seem to have the granularity I'm accustomed to.
The bulk of the punctuation searches seems to work: \p{P} all punctuation \p{Pd} dashes \p{Pi} opening quotes \p{Pf} closing quotes etc... But \p{Ll} and \p{Lu} (or \p{Lowercase_Letter}, \p{Uppercase_Letter}) both seem to find all letters regardless of case--just like \p{L}. Is this expected/known behavior? EDIT: Ooops! Never really expected those classes to be subject to the case-sensitive check-box. My bad. Nothing to see here! ![]() Last edited by DiapDealer; 02-22-2014 at 09:05 AM. |
![]() |
![]() |
![]() |
#2 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
That may explain some seemingly inconsistent behavior I have noted from time to time and just assumed it was because calibre's regex differed from PCRE in that case!
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,373
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
What is the regex engine being used here, out of curiosity? Is it Matthew Barnett's Python regex module (hope, hope, hope)?
|
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,201
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yes, it is, and it was your mention of it in the original sigil thread that got me looking into using it. It had a couple of bugs that I helped find/fix, but otherwise it's been great.
|
![]() |
![]() |
![]() |
#5 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,373
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Cool. Glad it's working out. I'm a big fan of it.
![]() Variable-width lookbehinds (in addition to the robust unicode support) just tickles me pink! |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
search/replace multiline regex | Alt68er | Sigil | 4 | 02-01-2014 09:40 AM |
Regex search and replace | dwlamb | Sigil | 6 | 04-12-2013 02:34 PM |
regex search/replace | Sharlene | Sigil | 10 | 01-28-2012 04:14 AM |
need regex help search and replace | schuster | Calibre | 4 | 01-10-2011 09:00 AM |
regex search for roman numerals | Blurr | Calibre | 2 | 12-16-2009 05:55 PM |