MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Trying to limit a search to a single line... (https://www.mobileread.com/forums/showthread.php?t=174952)

ElMiko 04-10-2012 05:52 PM

Trying to limit a search to a single line...
 
I'm trying to catch strings that look like this:

Code:

<p class="calibre1">Don’t be late for school,” she called. [...]
(Note the missing opening quotation mark)

the regex search that I'm using
Code:

>([^“]*)”
returns a string that includes multiple lines. How do i get it to limit the search to a single line?

Perkin 04-10-2012 06:24 PM

Quote:

Originally Posted by ElMiko (Post 2037398)
the regex search that I'm using
Code:

>([^“]*)”

Without actually testing, add a question-mark after the asterisk, to make it non-greedy.

Code:

>([^“]*?

DiapDealer 04-10-2012 07:04 PM

It depends on how you define a "line." If no line-break character is ever encountered, the regex Find doesn't really care if the code and/or text wraps around the screen several times... it still considers it all one "line."

ElMiko 04-10-2012 07:12 PM

I mean a line of code, not a line of text.

what's happening is that my search is returning the highlighted part of the following string:

Code:

<p class="calibre1">“I have to go pick up Cindy.”</p>

<p class="calibre1">Well, don’t be late for school,”
she called. [...]

when all i want it to match is:
Code:

<p class="calibre1">“I have to go pick up Cindy.”</p>

<p class="calibre1">Well, don’t be late for school,” she called. [...]


Perkin 04-10-2012 08:09 PM

Try the following, works for that small sample.

Code:

(?<=[^p]>)([^“]*?)”

ElMiko 04-10-2012 08:26 PM

Quote:

Originally Posted by Perkin (Post 2037539)
Try the following, works for that small sample.

Code:

(?<=[^p]>)([^“]*?)”

Indeed it does! Thanks, Perkin. If i could impose on your expertise just a little longer, though, what does the first parenthetical expression (?<=[^p]>) mean?

I tried referring to the reg-ex cheatsheet that (I believe) theducks recommended several months ago, but I couldn't really make heads or tales of it based on the descriptions.


EDIT:
aww heck, spoke too soon. If the preceding line of code doesn't contain dialogue, the expression captures multiple lines...

I still would love an explanation for why the expression you provided DID work in that case. I might just be able to figure out how to extrapolate a comprehensive solution from that without bothering you guys any further!

Perkin 04-10-2012 08:49 PM

(?<=[^p]>)

The (?<= means that it looks for the next bit, but doesn't include it in the match, the [^p]> is looking for a '>' that isn't preceeded by a 'p' so it isn't matching on the the end paragraph tags, and the close ) is closing that group, then your actual search takes place.

Hope that helps you work it out.

Edit: If you give a larger sample and what it's messing up on and what you want matched, I'll have a look again.

Timur 04-11-2012 06:56 PM

Adding newlines in the character class will prohibit multiline matches in this case:

Code:

>([^“\r\n]*)”
If you prefer it that way, you can augment the pattern with a look-behind assertion to get rid of the ">" character from the matched string:

Code:

(?<=>)([^“\r\n]*)”

ElMiko 04-11-2012 07:18 PM

Quote:

Originally Posted by Timur (Post 2038892)
Adding newlines in the character class will prohibit multiline matches in this case:

Code:

>([^“\r\n]*)”
If you prefer it that way, you can augment the pattern with a look-behind assertion to get rid of the ">" character from the matched string:

Code:

(?<=>)([^“\r\n]*)”

Oof... feeling really silly. I didn't know you could exclude multiple characters, much less "\X" expressions...

Thanks to both of you for gently continuing my reg-ex education!

Perkin 04-11-2012 07:34 PM

A pretty good site for learning regex stuff is here

ElMiko 04-12-2012 04:39 PM

Quote:

Originally Posted by Perkin (Post 2038941)
A pretty good site for learning regex stuff is here

Thanks, Perkin. As it happens, I already have that one bookmarked and have referenced it a few times in the past. The problem i have with all the reg-ex tutorials is that I never understand what they're talking about until after I've figured out how to do it. Either I interpret the language they use differently than what it's supposed to mean or I simply think it's gibberish. For what it's worth, the site you linked to has unquestionably produced the best results for me... but it's still hit-or-miss.


All times are GMT -4. The time now is 07:54 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.