|  04-01-2014, 03:17 AM | #1 | 
| Zealot  Posts: 104 Karma: 12 Join Date: Apr 2010 Location: Melbourne, Australia Device: Kobo Sage, Kobo Aura H2O, LG V20 | 
				
				Regex help anyone?
			 
			
			I had sort of got used to Sigil’s regex but I’m finding myself a bit lost with Calibre’s different version. The book I’m editing has a lot of redundant tags like... Code: <div class="text">CONTENT</div> Code: <p class="normal-262-0-override"><span class="no-style-override-9">CONTENT</span></p> Code: <p>CONTENT</p> | 
|   |   | 
|  04-01-2014, 03:26 AM | #2 | 
| Guru            Posts: 657 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin) | 
			
			IF the div or p classes match your examples you can use the following Search Code: <div class="text">([^<]+)</div> Code: <p class="normal-262-0-override"><span class="no-style-override-9">([^<]+)</span></p> Code: <p>\1</p> Last edited by Perkin; 04-01-2014 at 03:32 AM. | 
|   |   | 
| Advert | |
|  | 
|  04-01-2014, 03:37 AM | #3 | 
| Zealot  Posts: 104 Karma: 12 Join Date: Apr 2010 Location: Melbourne, Australia Device: Kobo Sage, Kobo Aura H2O, LG V20 | 
			
			Thanks Perkin, the para one works fine but “No matches were found” for the div. Here’s an example. This one I can just delete, but the others (uselessly as the class isn’t even defined) wrap a lot of text. Code: <div class="text"> <p class="normal-262-0-override"><span class="no-style-override-9"><br/></span></p> <p style="page-break-after:always;"></p><div style="page-break-after:always"></div><p class="normal-262-0-override"><span class="no-style-override-9"><br/></span></p> </div> | 
|   |   | 
|  04-01-2014, 03:44 AM | #4 | 
| Zealot  Posts: 104 Karma: 12 Join Date: Apr 2010 Location: Melbourne, Australia Device: Kobo Sage, Kobo Aura H2O, LG V20 | 
			
			Hmm...seems like there are some of the paras it won’t find either. I even copied the match text from the ones I can see in the doc, but it still won’t find them. | 
|   |   | 
|  04-01-2014, 04:43 AM | #5 | 
| Hedge Wizard            Posts: 802 Karma: 19999999 Join Date: May 2011 Location: UK/Philippines Device: Kobo Touch, Nook Simple | 
			
			The Sigil forum has a sticky thread for regex examples.  As the Editor's version of Regex is different, it may be helpful to have a similar sticky in the Editor forum.
		 | 
|   |   | 
| Advert | |
|  | 
|  04-01-2014, 06:21 AM | #6 | 
| Color me gone            Posts: 2,089 Karma: 1445295 Join Date: Apr 2008 Location: Central Oregon Coast Device: PRS-300 | 
			
			One thing to watch out for is that the case sensitive check box will override the search criteria. If it is NOT checked then [a-z] will find not only a-z but A-Z as well. Are you searching inside a table? It seems to have trouble doing that. Also, are you typing things into the search or are you copying and pasting. Copying and pasting, then altering will catch hidden characters, like non-breaking spaces which might be present. | 
|   |   | 
|  04-01-2014, 09:06 AM | #7 | 
| Zealot  Posts: 104 Karma: 12 Join Date: Apr 2010 Location: Melbourne, Australia Device: Kobo Sage, Kobo Aura H2O, LG V20 | 
			
			Case sensitive was not checked.  Not searching in a table.  I copied and pasted to test just that possibility (hidden chars) but could not find the copied text.
		 | 
|   |   | 
|  04-01-2014, 09:22 AM | #8 | 
| Zealot  Posts: 104 Karma: 12 Join Date: Apr 2010 Location: Melbourne, Australia Device: Kobo Sage, Kobo Aura H2O, LG V20 | 
			
			Ah...for the paras, the cases where it works are ones where there are no other tags in the paragraph.  Is that a clue?
		 | 
|   |   | 
|  04-01-2014, 10:02 AM | #9 | 
| Color me gone            Posts: 2,089 Karma: 1445295 Join Date: Apr 2008 Location: Central Oregon Coast Device: PRS-300 | 
			
			How about one step at a time? Do the inner ones first and the outer ones after the first are gone. Also, some symbols like . are used in regex commands. If you suspect they might be, use \ in front of them, like \. | 
|   |   | 
|  04-01-2014, 10:06 AM | #10 | 
| Zealot  Posts: 104 Karma: 12 Join Date: Apr 2010 Location: Melbourne, Australia Device: Kobo Sage, Kobo Aura H2O, LG V20 | 
			
			Some paragraphs will always have tags in them though.  I can try though.  Either way I’ll get there in the end; it would just be quicker (and require less hunting down stray </span>s) with regex.
		 | 
|   |   | 
|  04-01-2014, 11:00 AM | #11 | 
| Color me gone            Posts: 2,089 Karma: 1445295 Join Date: Apr 2008 Location: Central Oregon Coast Device: PRS-300 | 
			
			The thing is if you have trouble figuring out how to separate the two, how is a program or regex search supposed to figure it out. The perfect one that will get everything you want always seems to sweep up others too. My solution is to gear for these ones that sweep everything and search one at a time, ready to hit replace and find, but also ready to go up to the text to fix something that would be overfixed by the regex. It is not so fast, but it is much faster than reading and fixing line by line or faster than repairing the damage of an overactive replace. | 
|   |   | 
|  04-01-2014, 11:12 AM | #12 | 
| Zealot  Posts: 104 Karma: 12 Join Date: Apr 2010 Location: Melbourne, Australia Device: Kobo Sage, Kobo Aura H2O, LG V20 | 
			
			Well, that is what I’ve been doing more or less.  Most of the formatting in this book is utterly useless anyway. I don’t actually have any trouble at all identifying the required sequences of characters; it’s just the suggested regex doesn’t seem to work consistently. The exact sequences of chars copied from the document are not found with regex though they are with a normal search. This leads me to wonder if there is something in the syntax that’s preventing it from working, but since it’s a bit of a black art to me and I don’t even know what the flavour of regex in Calibre is called, I don’t really know what to look for. It doesn’t seem too unreasonable to want to search for [exact sequence A]SOME TEXT[first occurrence of exact sequence B]. | 
|   |   | 
|  04-01-2014, 11:42 AM | #13 | 
| Color me gone            Posts: 2,089 Karma: 1445295 Join Date: Apr 2008 Location: Central Oregon Coast Device: PRS-300 | 
			
			It seems somewhat inconsistent to me too. But my abilities are kind of like yours. There are various cheatsheets that cover it, but it is not something that is easy to understand. So I don't know what I don't know. If you run the search again, sometimes it seems to work where it did not scoop it up the first time. It seems like they are the same too. | 
|   |   | 
|  04-01-2014, 12:17 PM | #14 | 
| Grand Sorcerer            Posts: 28,854 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			I've found there's really not that much difference between calibre-edit's regex engine and Sigil's. And most of those differences are very "edge case" kind of stuff: searching for unicode code-points, and the like. Mostly, calibre-edit's regex engine just adds a few things that weren't in Sigil's: like allowing variable-length lookbehinds, and matching the start/end of words with \m \M. If case matters ... then case matters. And if so, the appropriate box needs to be checked. | 
|   |   | 
|  04-01-2014, 11:21 PM | #15 | 
| Zealot  Posts: 104 Karma: 12 Join Date: Apr 2010 Location: Melbourne, Australia Device: Kobo Sage, Kobo Aura H2O, LG V20 | 
			
			Figured out what I wanted was this... Code: <p class="normal-262-0-override"><span class="no-style-override-9">(.*)</span></p> <p>\1</p> | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Need help with a regex | mobiuser | Workshop | 15 | 01-19-2014 05:57 PM | 
| Help with some regex | Chaos_Therum | Library Management | 1 | 12-28-2013 11:20 AM | 
| Regex | Gunnerp245 | Conversion | 5 | 03-05-2012 04:15 PM | 
| Help me with regex please. | eVrajka | Library Management | 5 | 08-15-2011 12:17 PM | 
| Regex | Faster | Sigil | 2 | 04-24-2011 09:08 PM |