Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-17-2012, 12:34 PM   #1
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 299
Karma: 56788
Join Date: Jun 2011
Device: Kindle
How to exclude strings before and after

trying to find instances of a character ("•") in a document where it might pop up as a scanning artifact rather than an intended character. Unfortunately, "•" is also used as a section break. What I'd like to do is search for instances of "•" when it is not enclosed by <p> tags.

My first impulse was to do a search for:
Code:
(?<!<p class="calibre1">)•(?!</p>)
but before i'd finished typing it, I realized that would excluded instances where "•" was preceded by "<p class="calibre1">" and, separately, instances where "•" was followed by "</p>". What I really need is for it to find instances where "•" is simultaneously preceded by "<p class="calibre1">" AND followed by "</p>". Is there any way to express this?
Attached Files
File Type: epub Test Case.epub (230.4 KB, 34 views)

Last edited by ElMiko; 03-08-2013 at 08:54 PM.
ElMiko is offline   Reply With Quote
Old 07-17-2012, 12:59 PM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,531
Karma: 43837842
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I'm not sure I understand the question. You haven't made your look-behind or look-ahead assertions optional, so both should already be required. Your expression should already be logically correct. But I'm not certain whether you're looking for instances where the bullet IS or ISN'T enclosed with p tags.

IS
Code:
(?<=<p class="calibre1">)•(?=</p>)
ISN'T
Code:
(?<!<p class="calibre1">)•(?!</p>)
DiapDealer is offline   Reply With Quote
 
Advertisement
Old 07-17-2012, 01:44 PM   #3
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 299
Karma: 56788
Join Date: Jun 2011
Device: Kindle
Quote:
Originally Posted by DiapDealer View Post
I'm not sure I understand the question. You haven't made your look-behind or look-ahead assertions optional, so both should already be required. Your expression should already be logically correct. But I'm not certain whether you're looking for instances where the bullet IS or ISN'T enclosed with p tags.

ISN'T
Code:
(?<!<p class="calibre1">)•(?!</p>)
I'm looking for isn't, but, as I suspected, the above expression doesn't search for instances where the bullet is NOT enclosed with p tags. It searches for instances where the bullet is not preceded by an opening p tag, and for instances where the bullet is not preceded by closing p tag. Which is to say, it will exclude the following instance that I don't want it to exclude:

Code:
<p class="calibre1">•A</p>

<p class="calibre1">A•</p>
EDIT: As I understand the way the above search works (and this seems to be confirmed by my subsequent testing), is that It first looks for all instances with a bullet preceded by the p tag and excludes them. Then it looks for all instances of the bullet followed by the p tag, and excludes them. Then it returns any results that weren't excluded by the preceding TWO searches. It is not a single search to exclude an enclosed bullet.

Last edited by ElMiko; 07-17-2012 at 02:02 PM.
ElMiko is offline   Reply With Quote
Old 07-17-2012, 02:10 PM   #4
mmat1
Det
mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.
 
mmat1's Avatar
 
Posts: 995
Karma: 1529558
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
Quote:
Originally Posted by ElMiko View Post
Unfortunately, "•" is also used as a section break. What I'd like to do is search for instances of "•" when it is not enclosed by <p> tags.
If you can clearly identify the Section breaks, then change the bullets there to something else, which can be clearly identyfied later for a change-back.

After securing the passages with the bullets you want to keep this way, it should be easy to exchange your scan-errors with a simple s/r.
mmat1 is offline   Reply With Quote
Old 07-17-2012, 02:15 PM   #5
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 299
Karma: 56788
Join Date: Jun 2011
Device: Kindle
Quote:
Originally Posted by mmat1 View Post
If you can clearly identify the Section breaks, then change the bullets there to something else, which can be clearly identyfied later for a change-back.

After securing the passages with the bullets you want to keep this way, it should be easy to exchange your scan-errors with a simple s/r.
Yes, I'd considered that. Actually I've more than considered that. I've done it in the past. But I'm really trying to find a way to do... well, what I said I was trying to do: exclude instances of a bullet where it is enclosed by p tags, and match all other instances of bullets.

Last edited by ElMiko; 07-17-2012 at 02:19 PM.
ElMiko is offline   Reply With Quote
Old 07-17-2012, 02:57 PM   #6
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,531
Karma: 43837842
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
EDIT: As I understand the way the above search works (and this seems to be confirmed by my subsequent testing), is that It first looks for all instances with a bullet preceded by the p tag and excludes them. Then it looks for all instances of the bullet followed by the p tag, and excludes them. Then it returns any results that weren't excluded by the preceding TWO searches. It is not a single search to exclude an enclosed bullet.
That's not the way I understand it... but then, I didn't really understand your goal. I still don't quite.

Do you want it to exclude any bullet that occurs anywhere in any string that is enclosed by those "p class='calibre1'" tags? That could prove pretty tough, if so. I know I can't get my head around the expression to accomplish that (not that THAT renders it impossible by any means ).

I'd say your best bet is to isolate/alter the scene-breaks first and then catch any possible OCR glitches in a subsequent search.

Last edited by DiapDealer; 07-17-2012 at 03:11 PM.
DiapDealer is offline   Reply With Quote
Old 07-17-2012, 03:08 PM   #7
WS64
WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.
 
WS64's Avatar
 
Posts: 590
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / Bookeen Frontlight / Kobo Mini / Kindle 3 / Nook Color
I would do 2 searches, first for [^>]• and then for •[^<].
None of them should find <p class="calibre1">•</p>
WS64 is offline   Reply With Quote
Old 07-17-2012, 08:43 PM   #8
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 299
Karma: 56788
Join Date: Jun 2011
Device: Kindle
Quote:
Originally Posted by DiapDealer View Post
That's not the way I understand it... but then, I didn't really understand your goal. I still don't quite.

Do you want it to exclude any bullet that occurs anywhere in any string that is enclosed by those "p class='calibre1'" tags? That could prove pretty tough, if so. I know I can't get my head around the expression to accomplish that (not that THAT renders it impossible by any means ).

I'd say your best bet is to isolate/alter the scene-breaks first and then catch any possible OCR glitches in a subsequent search.
I want a search that will match:

Code:
<p class="calibre1">A•</p>

<p class="calibre1">•A</p>

<p class="calibre1">A •BC D</p>
and exclude:

Code:
<p class="calibre1">•</p>
@WS64, thanks are once again due for a goodfaith attempt to help. However, again, I've already found workarounds for my problem before. What I'm looking for in this particular query is specifically a single expression that will do what I want. Also your workaround would still exclude cases where the bullet was set between non-p tags. ie:

Code:
<p>Here's some text <span>•</span>, ain't life grand?</p>

or

<p>here's some more <i>•</i> text</p>
But as I said, I'm looking for a single expression, anyway.

Last edited by ElMiko; 07-17-2012 at 08:57 PM.
ElMiko is offline   Reply With Quote
Old 07-17-2012, 09:01 PM   #9
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,531
Karma: 43837842
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
That's tough. I can't think of a one-time expression that will accomplish that.

Seriously... I'd change all occurrences of
Code:
<p class="calibre1">•</p>
to something else: like
Code:
<p class="calibre1">_@</p>
Then you can clean up all the rest of the • characters and change the scene breaks back after you're done.

WS64's two pass suggestion could work too (and wouldn't require altering the existing scene breaks).

EDIT: Never mind the alternate suggestions if you already worked around the problem. I understand looking for that "one regexp to bind them all", but I'm just not sure it's worth an extended quest for this one.

Last edited by DiapDealer; 07-17-2012 at 09:05 PM.
DiapDealer is offline   Reply With Quote
Old 07-17-2012, 09:30 PM   #10
Tex2002ans
Fanatic
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 539
Karma: 562971
Join Date: Jul 2012
Device: Nook
Quote:
Originally Posted by DiapDealer View Post
Do you want it to exclude any bullet that occurs anywhere in any string that is enclosed by those "p class='calibre1'" tags? That could prove pretty tough, if so. I know I can't get my head around the expression to accomplish that (not that THAT renders it impossible by any means ).

I'd say your best bet is to isolate/alter the scene-breaks first and then catch any possible OCR glitches in a subsequent search.
I agree. I was sitting here for the past 40 minutes trying to wrap my head around a regular expression to handle all the situations at once.... and was getting stuck at being able to handle Scene Breaks.

I would follow the advice of mmat1 and temporarily replace all Scene Break '•'s, then go off fixing all the • with a normal search/replace. If you want to be lazy, I narrowed it down to two regexes:

You can Search/Replace with \1\2:

Code:
(<p class="calibre1">[^•]*)([^<]+</p>)
*** Not very efficient, but gets the job done ***

Red: "In a p with class calibre1" grab 0 or more characters that are NOT '•'.
  • This handles cases where the • is the first character in the paragraph, or in between somewhere.
  • Is the highly inefficient part. Will grab EVERYTHING until it hits a •.

Middle part: Finds the •.

Blue: Grabs the rest until it hits a </p>

Code:
([^>])•(</p>)
The second regex will grab all the ones in which • is the last character of the paragraph, while failing on your scene breaks.

By the way, here he was the example sentences I came up with and was working on:

Code:
<p class="calibre1">•</p>

<p class="calibre1">•A</p>

<p class="calibre1">A•</p>

<p class="calibre1">This has none.</p>

<p class="calibre1">This has none.</p>

<p class="calibre1">• This is a test sentence.</p>

<p class="calibre1">•</p>

<p class="calibre1">This is a test sentence. •</p>

<p class="calibre1">This is two test sentences. • This is two test sentences.</p>
Also, whenever you would like help on regular expressions, it would help to get lots of test cases. It took me a while to wrap my head around exactly what you were asking.

Last edited by Tex2002ans; 07-17-2012 at 09:33 PM.
Tex2002ans is offline   Reply With Quote
Old 07-17-2012, 09:32 PM   #11
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Something like this perhaps?
Code:
(?!<p[^<>]*>•</p>)<([^\s]+)[^<>]*>[^<>]*•[^<>]*</\1>
Serpentine is offline   Reply With Quote
Old 07-17-2012, 09:39 PM   #12
Tex2002ans
Fanatic
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 539
Karma: 562971
Join Date: Jul 2012
Device: Nook
Quote:
Originally Posted by Serpentine View Post
Something like this perhaps?
Code:
(?!<p[^<>]*>•</p>)<([^\s]+)[^<>]*>[^<>]*•[^<>]*</\1>
Hmm... or this magic.
Tex2002ans is offline   Reply With Quote
Old 07-18-2012, 04:57 AM   #13
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 299
Karma: 56788
Join Date: Jun 2011
Device: Kindle
Just for the record, DD's suggestion is closest to what I've done in the past... the only difference being in the symbol i used as a replacement (the fleur de lis instead of the "_@"). the reason i didn't include any other examples is becuase i thought that saying I want to exclude all instances of a bullet when it appears like this:
Code:
<p class="calibre1">•</p>
while matching all other instances of a bullet was the least redundant, most clear way of saying it. "Match everything but this" is as simple a explanation as I can come up with. There is an infinitude of possible matches that i'd accept, so coming up with positive examples seemed counter productive.

In any case, I thought I'd float it here and see if anyone bit. I've run into the problem more than once in the past, and knew I'd basically totally played out my own knowledge of reg-ex. I really do appreciate you all rolling it around and giving it the old college try.

PS - @Tex: as always i especially appreciate your breaking down your thought process and compartmentalizing the behavioral characteristics of your regex. Frankly, it's the other reason I posted my question: I've come to realize that whether or not I get the specific answer I'm looking for, I'll always come out of the thread knowing more about reg-ex than i did going into it.

Last edited by ElMiko; 07-18-2012 at 05:03 AM.
ElMiko is offline   Reply With Quote
Old 07-18-2012, 05:25 AM   #14
Tex2002ans
Fanatic
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 539
Karma: 562971
Join Date: Jul 2012
Device: Nook
Quote:
Originally Posted by ElMiko View Post
"Match everything but this" is as simple a explanation as I can come up with.
Always helps to have test cases. There are always strange things that happen that were not expected. Either use some real examples that you ran into during your epubbing, or just some simple test cases such as I came up with!

Would also help save a little time by people who want to help, and make more accurate regexes.

Quote:
Originally Posted by ElMiko View Post
PS - @Tex: as always i especially appreciate your breaking down your thought process and compartmentalizing the behavioral characteristics of your regex. Frankly, it's the other reason I posted my question: I've come to realize that whether or not I get the specific answer I'm looking for, I'll always come out of the thread knowing more about reg-ex than i did going into it.
Yes, always helps getting the breakdowns. Sometimes regexes can be so obscure, and learning which piece does what always helps.

I thought that \1 could only be used in the Replace, but in Serpentine's example I see it can be used in a Search as well.
Tex2002ans is offline   Reply With Quote
Old 07-21-2012, 07:34 PM   #15
Timur
Connoisseur
Timur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five words
 
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
@ElMiko: This pattern should find all instances of the bullet character, excluding only the ones that you do not want.

Code:
(<p[^>]*?>)?\K•(?(1)(*SKIP)(?!</p>))
Let me break this pattern down:


(<p[^>]*?>)? : Match 1 or 0(also called optional match) p element opening tag, capture it if it exists.

\K : Resets the start of match (anything matched before this will not be included in the match result).

• : The match we are interested in, in our case it is the bullet character alone •.

(?(1) : If we have successfully found an opening p tag in the first capture group process the following, otherwise ignore them altogether:
(*SKIP) : advance the starting point in the next search iteration to here,
(?!</p>) : if we find a closing of the p element at this point(also fails the match).
)


Some more (wordy) explanations of the parts:

(?(1)yes-pattern) : This is a form of conditional subpattern, it searches for the yes-pattern if the condition is true, otherwise it has no effect. The condition in this form is the check for match for the first capture group in our complete pattern. In our case it is the conditional match at the beginning of the pattern. Read as "If the first capture group has matched anything then look for yes-pattern here, if the first capture group is empty(remember that we have used an optional match, so no-match was acceptable to our pattern) then ignore this whole subpattern."

yes-pattern in our example is (*SKIP)(?!</p>)

(*SKIP) : Normally when a match fails at any point, the starting point in the source that will be next tried for the pattern is advanced one character. (*SKIP) backtracking control verb causes the point to be advanced to where (*SKIP) verb is encountered during the matching if the pattern in whole fails matching. This implies two things are needed for the (*SKIP) verb to have an effect; 1) everything before it has already matched successfully, 2) something in the part of the pattern that comes after it caused a fail in match.

(?!</p>) : This is a classic negative lookahead pattern. If the subpattern is found at this point in the search, this causes a fail in the pattern. In our case it looks for </p>, and if this pattern is found here, (?!</p>) causes a fail in match, which in turn causes the (*SKIP) to have its effect and the starting point for the next search is advanced to the (*SKIP) point, which is the character just after the bullet, •.


Here is a templated form(whitespaces in here are for ease of reading):

(negative_lookbehind_pattern_simulator)? \Kwanted_pattern (?(1) (*SKIP) (?!negative_lookahed_pattern))

Note that unlike regular lookbehind patterns, this form allows for indefinite length matches, because technically it is not a lookbehind assertion but a basic match pattern that starts searching from the current character position; it behaves like a negative lookbehind by the help of the later conditional part of the template. This is the reason why I have labelled this pattern as a "simulator".

Corollary:

Here is a modified form of this template one can use if only a negative lookbehind alternative is needed, which allows for indefinite length matches.

(negative_lookbehind_pattern_simulator)? \Kwanted_pattern (?(1) (*SKIP) (?!))

Here we use (?!) negative lookahead with an empty string as the subpattern. Empty string pattern always matches, hence its negative assertion always fails to match, hence (*SKIP) effect is achieved regardless of what comes after it as long as the first capture group is not empty.

The reason for the existence of the (*SKIP) in the pattern is left as an exercise to the reader.

The reason for why I am leaving it as an exercise is a sudden lazy spell.
Timur is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Exclude books w/certain tag by default? ander111 Library Management 6 01-13-2014 04:15 PM
How can I exclude all the images from NYT? Steven630 Recipes 1 05-11-2012 09:54 AM
Exclude some parts from build MartinJT Calibre 4 09-15-2011 09:39 AM
Exclude files from indexing? HansTWN iRex 8 04-20-2010 06:02 AM
MobileRead improvements: Exclude forums, et al. Alexander Turcic Announcements 20 05-09-2008 07:33 PM


All times are GMT -4. The time now is 10:31 PM.


MobileRead.com is a privately owned, operated and funded community.