|  12-06-2012, 01:48 PM | #1 | 
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none |  regex (.*) not liking hidden characters 
			
			trying to fix a book where div has been used rather than p throughout so book layout layout is thousands of lines/paragraphse like thesethese: <div class="c3"> some body text beginning on a new line, followed by the closing div tag, also on a new line </div> I would expect this to work: find <div class="c3">(.*)</div> replace all <p class="c3">\1</p> but I get no matches. to get the regex to work, I carefully have to copy & paste in whatever hidden characters are separating the div tags from the body text i.e. whatever is causing the line breaks. the (.*) regex then works as expected once it is within the linebreak characters so is this a) just a vary badly formatted source b) some side effect of pretty print / tidy settings c) a bug in regex engine or ( more likely!) in my understanding of how it should work ? now I think ( from limited testing )that pretty print has no issues with <div> all on one line example </div> layouts so it is probably not option b) ? | 
|   |   | 
|  12-06-2012, 02:00 PM | #2 | 
| Calibre Plugins Developer            Posts: 4,735 Karma: 2197770 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis | 
			
			You need to tick the "DotAll" option, or add (?s). It is not a bug, it is just how PCRE works for multiline expressions.
		 | 
|   |   | 
| Advert | |
|  | 
|  12-06-2012, 03:03 PM | #3 | 
| ♫            Posts: 661 Karma: 506380 Join Date: Aug 2010 Location: Germany Device: Kobo Aura / PB Lux 2 / Bookeen Frontlight / Kobo Mini / Nook Color | 
			
			And actually I would just replace <div with <p and let tidy do the rest
		 | 
|   |   | 
|  12-06-2012, 03:24 PM | #4 | |
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | Quote: 
  Just as easy for me to replace: Code: <(/?)div([^>]*?)> Code: <\1p\2> Last edited by DiapDealer; 12-06-2012 at 03:42 PM. | |
|   |   | 
|  12-06-2012, 03:50 PM | #5 | 
| Evangelist            Posts: 490 Karma: 1665031 Join Date: Nov 2010 Location: Vancouver Island, Nanaimo Device: K2 (retired), Kobo Touch (passed to the wife), KGlo, Galaxy TabPro | 
			
			This is one I use for most of my search and replace where there is a start and end tag or character (such as quotes): Find: (?s)<div(.*?)</div> Replace: <p\1</p> Things I have learned from those more familiar with Regex and Sigil than myself: (?s) search over multiple lines (.*?) look for whatever comes after this and stop at first instance found. In the above, look for the </div> and stop the search at the very 1st one found. Without this I have had instances where it does not stop at the first instance found but have ended up with 2 or 3 paragraphs and sometimes the entire chapter highlighted. | 
|   |   | 
| Advert | |
|  | 
|  12-06-2012, 04:11 PM | #6 | 
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			I used to use similar F&R, Danger, but that approach burned me too many times when stuff was nested--as divs can be. Plus, I've learned to be appropriately afraid of relying too heavily on the potential greediness of (.*?). So now, when the actual tags are what is needing replaced, I don't waste time trying to match/capture any-and-all text those tags might contain. I just match/capture/replace the tags themselves. To each their own though... that's the beauty of regex.    | 
|   |   | 
|  12-06-2012, 05:02 PM | #7 | 
| Evangelist            Posts: 490 Karma: 1665031 Join Date: Nov 2010 Location: Vancouver Island, Nanaimo Device: K2 (retired), Kobo Touch (passed to the wife), KGlo, Galaxy TabPro | 
			
			Hmm, never thought of nested tags. Yah I can see where that would burn you if you don't pay attention or do a blind find/replace all. Something I very quickly learned NOT to do unless I am absolutely positive it will be ok. Thanks for the heads up, so far I haven't had any nested tags in the books I've recently been fixing up but I do know I have some books that do have them that I will be fixing. Always learning something here   | 
|   |   | 
|  12-16-2012, 06:44 PM | #8 | |
| Addict            Posts: 254 Karma: 69786 Join Date: May 2006 Location: Oslo, Norway Device: Kobo Aura, Sony PRS-650 | Quote: 
 Version your files, and always do a visual inspection + validate immediately after a replace even though you're sure it will be OK. Regexes are too useful not to be applied to html, even if you might invoke a few elder horrors   | |
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Regex Solution to hidden href search? | MizSuz | Sigil | 16 | 09-29-2012 07:40 PM | 
| ePub validation error - not liking div tags | Kratos | ePub | 19 | 07-23-2012 11:14 AM | 
| I am really liking my new Sony PRS-T1 | noshoes | Sony Reader | 7 | 01-25-2012 08:03 AM | 
| Touch So How Is Everyone Liking Theirs? | MorganM | Kobo Reader | 34 | 06-29-2011 01:45 PM | 
| How are you liking your iPad case? | Maggie Leung | Apple Devices | 46 | 06-10-2010 05:08 AM |