Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 04-08-2018, 01:44 PM   #1
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
RegEx: anchor problem in editor

I have an anchoring problem on using regex in the calibre editor.
Say, I want to place an anchor between two succeeding M's, in which between those M's:
  • there is no space
  • there are 1 or more spaces

For that I'm using the regex given in attachment 1 (and its explanation is given in attachment 2).

As can be seen in the test string, 3 anchor locations should (and are) found in a simulator as I expected them to be.

However, when using the regex in the calibre editor, the very first one is never found (=when no separating spaces are present). Apparently, the calibre regex engine works different. So, how to make it work for the calibre regex engine? Any ideas?
Attached Thumbnails
Click image for larger version

Name:	calibre1.jpg
Views:	464
Size:	12.5 KB
ID:	163323   Click image for larger version

Name:	calibre2.jpg
Views:	457
Size:	55.6 KB
ID:	163324  
DrChiper is offline   Reply With Quote
Old 04-08-2018, 09:59 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Works for me.
Attached Thumbnails
Click image for larger version

Name:	Screenshot_20180409_072904.png
Views:	453
Size:	377.9 KB
ID:	163331  
kovidgoyal is offline   Reply With Quote
Advert
Old 04-09-2018, 02:53 AM   #3
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
Hi Kovid, you are absolutely right: (?<=M)\*M works.
But, that is not the expression I used

My expression also used a positive lookahead part

(?<=M)\s*(?=M)

which fails when \s* is empty. Further, when succeeding M<space>M text constructs are present after the very first MM construct, they are also not found, because the search is stopped immediately and the attached error message is displayed.

I use a text like: adasMMasd adssdM Masdda M Msad

Although my regex knowledge is limited, I cannot recall that there should be a limitation on the expected behavior when (assumed) greedy search would certainly be able to find the other M's ones.

Is there an explanation you know of?

Addation: I use the lookahead construct specifically in order to not "consume" the found text.
Attached Thumbnails
Click image for larger version

Name:	calibre3.jpg
Views:	436
Size:	21.7 KB
ID:	163335  

Last edited by DrChiper; 04-09-2018 at 02:59 AM.
DrChiper is offline   Reply With Quote
Old 04-09-2018, 06:30 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
(?<=M)\s*(?=M) will fail when there is no space because there is nothing to match. Remember that lookbehind/ahead assertions dont actually match anything, they only serve as anchors.
kovidgoyal is offline   Reply With Quote
Old 04-09-2018, 09:15 AM   #5
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
Correct. It was my intention to place an anchor for adding some substring.

But: it appears that this is a so-called "zero length match" situation, which seem to be some gray area in regex engines. There is no consensus about the behavior, other than that infinite (lookup) loops must be avoided.

To force, for instance, an zero length match, use the expression [\.]{0}
This will match after every character in a string.
calibre's regex engine halts and reports no matches.
The Notepad++ regex engine, for instance, flags every character correctly as a "zero length match", but happily uses the anchor to substitute, starting that anchor location, any substring.
Two regex engines, different behavior.

So in the end, I have to "program" around this for calibre, which is doable.
Thanks for your time.
DrChiper is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
regex in calibre editor mrmikel Editor 2 02-01-2014 10:39 AM
Regex problem John2011 Sigil 8 01-21-2014 02:12 AM
Guide to Regex in Calibre Editor vs Notepad++ Agama Editor 6 12-23-2013 05:10 AM
HTML Page to EPUB Named Anchor Problem gknitz Conversion 11 10-02-2013 11:00 PM
Regex Problem huebi Sigil 3 05-10-2011 04:32 AM


All times are GMT -4. The time now is 11:31 AM.


MobileRead.com is a privately owned, operated and funded community.