Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 12-22-2014, 10:03 PM   #1
nqk
Fanatic
nqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beauty
 
Posts: 564
Karma: 32228
Join Date: Feb 2012
Device: Onyx Boox Leaf
[Regex Search] Minimal match not possible?

Dear you guys,

When I do a regex search for something, just say “(.*)”, Editor would select EVERY thing between the first and the last in a paragraph, if that paragraph includes multiple set of “(.*)”. This means that the selection is WRONG, because it includes texts in between two sets of “(.*)”.

If the box "Dot all" is checked, the situation applies then not for a single paragraph but the whole file.

In Sigil, there is a "Minimal Match" option, which would select the right set.



nqk is offline   Reply With Quote
Old 12-22-2014, 10:08 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,265
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
(.*?)
kovidgoyal is offline   Reply With Quote
Advert
Old 12-23-2014, 08:28 AM   #3
dmonasse
Member
dmonasse began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Apr 2014
Location: Paris
Device: ipad 2, Ubuntu
Look at the Python module "re" documentation at https://docs.python.org/2/howto/regex.html#regex-howto
and you shall read:
the solution is to use the non-greedy qualifiers *?, +?, ??, or {m,n}?, which match as little text as possible.
dmonasse is offline   Reply With Quote
Old 12-23-2014, 12:14 PM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Possibly a good idea to get used to doing this in the regex itself on general principle, rather than associating instruction sets with the regex.
eschwartz is offline   Reply With Quote
Old 12-23-2014, 04:34 PM   #5
DigitEditLab
Member
DigitEditLab began at the beginning.
 
DigitEditLab's Avatar
 
Posts: 21
Karma: 10
Join Date: Feb 2014
Device: Kobo Aura HD, Samsung Note II, Kindle and a few more
Additional request, @kovid or any expert at hand:

(.*?) is a bit hungry sometimes. I'd like to match anything but the tags: only characters, digits, punctuation marks and spaces but no <>.

What's the trick? Thanks in advance.
DigitEditLab is offline   Reply With Quote
Advert
Old 12-23-2014, 04:40 PM   #6
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
The trick is don't use a dot-match-all symbol. Use a regex character class, like
Code:
[a-zA-Z0-9.?',"]
Or match all but tag brackets:
Code:
[^<>]
I found this regex tutorial to be very helpful in learning the various fine points of regex: http://www.regular-expressions.info/

There are some very interesting yet obscure applications in the corners.

Like this interesting use of negative lookarounds to find matching span tags, even when nested, and delete the matching sets:

Code:
<span[^<>]*>((?:(?!<(?:/?span)).)*)</span>
The bit on the inside finds only text that does not include the arbitrary string "</?span", inside matching span tags.

Last edited by eschwartz; 12-23-2014 at 04:51 PM.
eschwartz is offline   Reply With Quote
Old 12-24-2014, 01:56 AM   #7
mikapanja
Perfectionist
mikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 72
Karma: 12802
Join Date: Apr 2014
Device: none
Quote:
Originally Posted by DigitEditLab View Post
Additional request, @kovid or any expert at hand:

(.*?) is a bit hungry sometimes. I'd like to match anything but the tags: only characters, digits, punctuation marks and spaces but no <>.
Just in case you need to match everything between the tags, excluding the tags, this should do the trick:

(?<=<.*?>)(.*)(?=<.*?>)
mikapanja is offline   Reply With Quote
Old 12-24-2014, 03:19 AM   #8
nqk
Fanatic
nqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beautynqk does all things with Zen-like beauty
 
Posts: 564
Karma: 32228
Join Date: Feb 2012
Device: Onyx Boox Leaf
Thank you, guys.

Each day I learn something.
nqk is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search and replace returns entire field when no match is found wladdy Calibre 2 02-16-2014 01:51 AM
Metadata Search & Replace - when it doesn't match Aldebaranian Library Management 4 09-28-2011 11:35 AM
how to have regex dot match any character including newline? gnychis Calibre 5 11-30-2010 06:35 PM
Need help with a conversion regex - can't match newline ereader123 Calibre 2 03-29-2010 10:58 AM
Search tags using exact match? chaley Calibre 3 01-21-2010 01:16 PM


All times are GMT -4. The time now is 07:03 AM.


MobileRead.com is a privately owned, operated and funded community.