MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   regex puzzle: finding paragraph before... (https://www.mobileread.com/forums/showthread.php?t=170089)

cybmole 02-23-2012 03:37 PM

regex puzzle: finding paragraph before...
 
due to a badly formatted book I was trying to constuct a regex which would find any <p......./p> section which occured immediately beofre a <div, in order to then tweak that found chunk.

but I could not do it.
a find expression like <p class "whatever">(.*)</p>?\s*<div is too greedy - it grabbed a whole load of paragraphs

i.e. from
<p para 1...
<p para 2..
...
<p para n..
< div....

the above regex grabs n paragraphs , is there a way to grab only the nth one , and replace it's CSS class

PS I am still using 0.42 regex

or could I use a .p+div class in CSS ?

WS64 02-23-2012 03:56 PM

<p class="whatever">([^<]*?)</p>\s*<div

cybmole 02-23-2012 04:03 PM

Quote:

Originally Posted by WS64 (Post 1977643)
<p class="whatever">([^<]*?)</p>\s*<div

thanks - if I read that correctly it's blocking any extra instances of < - will it cope with embedded style things like <em or < i inside of the main p tagged paragraphs ?
e.g. some of the paragraphs have extra embedded styles like:
<p class="calibre2">Without missing a beat, <em class="calibre4">High Wire</em> replies; “Without a job, I think I would head for the stars, to see what’s out there.”</p>

mmat1 02-23-2012 04:36 PM

Quote:

Originally Posted by cybmole (Post 1977657)
if I read that correctly it's blocking any extra instances of < - will it cope with embedded style things like <em or < i inside of the main p tagged paragraphs ?
e.g. some of the paragraphs have extra embedded styles like:

You're right, any <span>, <i> etc. will be not so good. ...

Actually
Code:

(<p.*?</p>)(\s*?<div>)
should do it, but test it carefully.

I'm not shure, if regex.dotall will work at 0.42, try to add a (?s) to the search-statement.

>>or could I use a .p+div class in CSS ?
if you realy want to change any <div> which follows a </p>, why not ?

Timur 02-23-2012 04:47 PM

If your paragraphs are contained in single lines with newlines between them you can use your pattern with a slight modification:

Code:

<p class "whatever">([^\r\n]*)</p>\s*<div
Or you can upgrade to 0.5.1, in which .(dot) does not match newlines unless you choose "Regex Dotall" mode, and you can use your original pattern unmodified.

DiapDealer 02-23-2012 04:58 PM

It's pretty hard to fine-tune an expression's (non)greediness in 0.4.2 when the "Minimal Matching" check-box is the only method of control you have over it.

In 0.5.x and higher, I'd use something like:
Code:

<p(.*?)?>.*?</p>(?=(\s+)?<div)

cybmole 02-24-2012 03:12 AM

thanks all, esp for how 0.52 is better than 0.42. I am eventually going to have enough reason to upgrade.

I see that I'm going to have to add a couple of symbols to my limited regex repertoire!

so far I have muddled through without ? or ^

Timur 02-24-2012 04:43 AM

Sigil 0.5.2 search engine has some bugs while searching "all html files". Until 0.5.3 is released I suggest using 0.5.1 instead.

All Sigil 0.5 releases

theducks 02-24-2012 10:06 AM

Quote:

Originally Posted by Timur (Post 1978233)
Sigil 0.5.2 search engine has some bugs while searching "all html files". Until 0.5.3 is released I suggest using 0.5.1 instead.

All Sigil 0.5 releases

Seconded

If you need to ADD Existing files, YOU need to use the File: New and not the Instant crash, right-click menu :thumbsup:


All times are GMT -4. The time now is 07:54 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.