Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 12-06-2012, 01:48 PM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,920
Karma: 1170716
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
Question regex (.*) not liking hidden characters

trying to fix a book where div has been used rather than p throughout

so book layout layout is thousands of lines/paragraphse like thesethese:

<div class="c3">
some body text beginning on a new line, followed by the closing div tag, also on a new line
</div>

I would expect this to work:
find
<div class="c3">(.*)</div>
replace all
<p class="c3">\1</p>

but I get no matches.

to get the regex to work, I carefully have to copy & paste in whatever hidden characters are separating the div tags from the body text i.e. whatever is causing the line breaks.

the (.*) regex then works as expected once it is within the linebreak characters

so is this
a) just a vary badly formatted source
b) some side effect of pretty print / tidy settings
c) a bug in regex engine or ( more likely!) in my understanding of how it should work ?

now I think ( from limited testing )that pretty print has no issues with
<div> all on one line example </div>
layouts so it is probably not option b) ?
cybmole is offline   Reply With Quote
Old 12-06-2012, 02:00 PM   #2
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,228
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
You need to tick the "DotAll" option, or add (?s). It is not a bug, it is just how PCRE works for multiline expressions.
kiwidude is offline   Reply With Quote
Old 12-06-2012, 03:03 PM   #3
WS64
WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.
 
WS64's Avatar
 
Posts: 574
Karma: 505638
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / Bookeen Frontlight / Kobo Mini / Kindle 3 / Nook Color
And actually I would just replace <div with <p and let tidy do the rest
WS64 is offline   Reply With Quote
Old 12-06-2012, 03:24 PM   #4
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,312
Karma: 42858084
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by WS64 View Post
And actually I would just replace <div with <p and let tidy do the rest
Wow! You're a lot braver than I am. I don't let Tidy do anything for me.
Just as easy for me to replace:
Code:
<(/?)div([^>]*?)>
with:
Code:
<\1p\2>
But otherwise, yeah... what kiwidude said. PCRE's . won't match cr/lf characters unless you explicitly tell it to.

Last edited by DiapDealer; 12-06-2012 at 03:42 PM.
DiapDealer is offline   Reply With Quote
Old 12-06-2012, 03:50 PM   #5
Danger
Evangelist
Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.
 
Danger's Avatar
 
Posts: 486
Karma: 1665031
Join Date: Nov 2010
Location: Vancouver Island, Nanaimo
Device: K2 (retired), Kobo Touch (passed to the wife), KGlo, Galaxy TabPro
This is one I use for most of my search and replace where there is a start and end tag or character (such as quotes):

Find: (?s)<div(.*?)</div>
Replace: <p\1</p>

Things I have learned from those more familiar with Regex and Sigil than myself:
(?s) search over multiple lines
(.*?) look for whatever comes after this and stop at first instance found. In the above, look for the </div> and stop the search at the very 1st one found. Without this I have had instances where it does not stop at the first instance found but have ended up with 2 or 3 paragraphs and sometimes the entire chapter highlighted.
Danger is offline   Reply With Quote
Old 12-06-2012, 04:11 PM   #6
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,312
Karma: 42858084
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I used to use similar F&R, Danger, but that approach burned me too many times when stuff was nested--as divs can be. Plus, I've learned to be appropriately afraid of relying too heavily on the potential greediness of (.*?). So now, when the actual tags are what is needing replaced, I don't waste time trying to match/capture any-and-all text those tags might contain. I just match/capture/replace the tags themselves. To each their own though... that's the beauty of regex.
DiapDealer is offline   Reply With Quote
Old 12-06-2012, 05:02 PM   #7
Danger
Evangelist
Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.
 
Danger's Avatar
 
Posts: 486
Karma: 1665031
Join Date: Nov 2010
Location: Vancouver Island, Nanaimo
Device: K2 (retired), Kobo Touch (passed to the wife), KGlo, Galaxy TabPro
Hmm, never thought of nested tags. Yah I can see where that would burn you if you don't pay attention or do a blind find/replace all. Something I very quickly learned NOT to do unless I am absolutely positive it will be ok.

Thanks for the heads up, so far I haven't had any nested tags in the books I've recently been fixing up but I do know I have some books that do have them that I will be fixing.

Always learning something here
Danger is offline   Reply With Quote
Old 12-16-2012, 06:44 PM   #8
Man Eating Duck
Addict
Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.
 
Posts: 253
Karma: 69784
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
Quote:
Originally Posted by Danger View Post
Hmm, never thought of nested tags. Yah I can see where that would burn you if you don't pay attention or do a blind find/replace all. Something I very quickly learned NOT to do unless I am absolutely positive it will be ok.
This link is appropriate: The <center> cannot hold it is too late.

Version your files, and always do a visual inspection + validate immediately after a replace even though you're sure it will be OK. Regexes are too useful not to be applied to html, even if you might invoke a few elder horrors
Man Eating Duck is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex Solution to hidden href search? MizSuz Sigil 16 09-29-2012 07:40 PM
ePub validation error - not liking div tags Kratos ePub 19 07-23-2012 11:14 AM
I am really liking my new Sony PRS-T1 noshoes Sony Reader 7 01-25-2012 08:03 AM
Touch So How Is Everyone Liking Theirs? MorganM Kobo Reader 34 06-29-2011 01:45 PM
How are you liking your iPad case? Maggie Leung Apple Devices 46 06-10-2010 05:08 AM


All times are GMT -4. The time now is 04:14 PM.


MobileRead.com is a privately owned, operated and funded community.