02-23-2012, 02:37 PM | #1 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
regex puzzle: finding paragraph before...
due to a badly formatted book I was trying to constuct a regex which would find any <p......./p> section which occured immediately beofre a <div, in order to then tweak that found chunk.
but I could not do it. a find expression like <p class "whatever">(.*)</p>?\s*<div is too greedy - it grabbed a whole load of paragraphs i.e. from <p para 1... <p para 2.. ... <p para n.. < div.... the above regex grabs n paragraphs , is there a way to grab only the nth one , and replace it's CSS class PS I am still using 0.42 regex or could I use a .p+div class in CSS ? Last edited by cybmole; 02-23-2012 at 02:39 PM. |
02-23-2012, 02:56 PM | #2 |
♫
Posts: 660
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / PB Lux 2 / Bookeen Frontlight / Kobo Mini / Nook Color
|
<p class="whatever">([^<]*?)</p>\s*<div
|
Advert | |
|
02-23-2012, 03:03 PM | #3 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
thanks - if I read that correctly it's blocking any extra instances of < - will it cope with embedded style things like <em or < i inside of the main p tagged paragraphs ?
e.g. some of the paragraphs have extra embedded styles like: <p class="calibre2">Without missing a beat, <em class="calibre4">High Wire</em> replies; “Without a job, I think I would head for the stars, to see what’s out there.”</p> |
02-23-2012, 03:36 PM | #4 | |
Berti
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
|
Quote:
Actually Code:
(<p.*?</p>)(\s*?<div>) I'm not shure, if regex.dotall will work at 0.42, try to add a (?s) to the search-statement. >>or could I use a .p+div class in CSS ? if you realy want to change any <div> which follows a </p>, why not ? Last edited by mmat1; 02-23-2012 at 04:22 PM. |
|
02-23-2012, 03:47 PM | #5 |
Connoisseur
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
|
If your paragraphs are contained in single lines with newlines between them you can use your pattern with a slight modification:
Code:
<p class "whatever">([^\r\n]*)</p>\s*<div |
Advert | |
|
02-23-2012, 03:58 PM | #6 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
It's pretty hard to fine-tune an expression's (non)greediness in 0.4.2 when the "Minimal Matching" check-box is the only method of control you have over it.
In 0.5.x and higher, I'd use something like: Code:
<p(.*?)?>.*?</p>(?=(\s+)?<div) Last edited by DiapDealer; 02-23-2012 at 04:03 PM. |
02-24-2012, 02:12 AM | #7 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
thanks all, esp for how 0.52 is better than 0.42. I am eventually going to have enough reason to upgrade.
I see that I'm going to have to add a couple of symbols to my limited regex repertoire! so far I have muddled through without ? or ^ |
02-24-2012, 03:43 AM | #8 |
Connoisseur
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
|
Sigil 0.5.2 search engine has some bugs while searching "all html files". Until 0.5.3 is released I suggest using 0.5.1 instead.
All Sigil 0.5 releases |
02-24-2012, 09:06 AM | #9 | |
Well trained by Cats
Posts: 29,801
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
If you need to ADD Existing files, YOU need to use the File: New and not the Instant crash, right-click menu |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Puzzle help please! | ApK | Lounge | 2 | 11-14-2011 03:18 PM |
Preference: Paragraph indent or a little paragraph spacing? | 1611mac | General Discussions | 48 | 11-11-2011 12:43 AM |
Finding Sequences Puzzle | pdurrant | Lounge | 12 | 08-03-2010 04:22 AM |
Sock Puzzle | pdurrant | Lounge | 16 | 06-20-2010 04:32 AM |
Puzzle | emonti8384 | Lounge | 60 | 02-08-2010 09:55 PM |