![]() |
#1 |
Bookish
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
|
RegEx: anchor problem in editor
I have an anchoring problem on using regex in the calibre editor.
Say, I want to place an anchor between two succeeding M's, in which between those M's:
For that I'm using the regex given in attachment 1 (and its explanation is given in attachment 2). As can be seen in the test string, 3 anchor locations should (and are) found in a simulator as I expected them to be. However, when using the regex in the calibre editor, the very first one is never found (=when no separating spaces are present). Apparently, the calibre regex engine works different. So, how to make it work for the calibre regex engine? Any ideas? |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Works for me.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Bookish
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
|
Hi Kovid, you are absolutely right: (?<=M)\*M works.
But, that is not the expression I used ![]() My expression also used a positive lookahead part (?<=M)\s*(?=M) which fails when \s* is empty. Further, when succeeding M<space>M text constructs are present after the very first MM construct, they are also not found, because the search is stopped immediately and the attached error message is displayed. I use a text like: adasMMasd adssdM Masdda M Msad Although my regex knowledge is limited, I cannot recall that there should be a limitation on the expected behavior when (assumed) greedy search would certainly be able to find the other M's ones. Is there an explanation you know of? Addation: I use the lookahead construct specifically in order to not "consume" the found text. Last edited by DrChiper; 04-09-2018 at 02:59 AM. |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
(?<=M)\s*(?=M) will fail when there is no space because there is nothing to match. Remember that lookbehind/ahead assertions dont actually match anything, they only serve as anchors.
|
![]() |
![]() |
![]() |
#5 |
Bookish
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
|
Correct. It was my intention to place an anchor for adding some substring.
But: it appears that this is a so-called "zero length match" situation, which seem to be some gray area in regex engines. There is no consensus about the behavior, other than that infinite (lookup) loops must be avoided. To force, for instance, an zero length match, use the expression [\.]{0} This will match after every character in a string. calibre's regex engine halts and reports no matches. The Notepad++ regex engine, for instance, flags every character correctly as a "zero length match", but happily uses the anchor to substitute, starting that anchor location, any substring. Two regex engines, different behavior. So in the end, I have to "program" around this for calibre, which is doable. Thanks for your time. ![]() |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
regex in calibre editor | mrmikel | Editor | 2 | 02-01-2014 10:39 AM |
Regex problem | John2011 | Sigil | 8 | 01-21-2014 02:12 AM |
Guide to Regex in Calibre Editor vs Notepad++ | Agama | Editor | 6 | 12-23-2013 05:10 AM |
HTML Page to EPUB Named Anchor Problem | gknitz | Conversion | 11 | 10-02-2013 11:00 PM |
Regex Problem | huebi | Sigil | 3 | 05-10-2011 04:32 AM |