![]() |
#1 |
Zealot
![]() Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
|
Regex help anyone?
I had sort of got used to Sigil’s regex but I’m finding myself a bit lost with Calibre’s different version.
The book I’m editing has a lot of redundant tags like... Code:
<div class="text">CONTENT</div> Code:
<p class="normal-262-0-override"><span class="no-style-override-9">CONTENT</span></p> Code:
<p>CONTENT</p> |
![]() |
![]() |
![]() |
#2 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
IF the div or p classes match your examples you can use the following
Search Code:
<div class="text">([^<]+)</div> Code:
<p class="normal-262-0-override"><span class="no-style-override-9">([^<]+)</span></p> Code:
<p>\1</p> Last edited by Perkin; 04-01-2014 at 03:32 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Zealot
![]() Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
|
Thanks Perkin, the para one works fine but “No matches were found” for the div.
Here’s an example. This one I can just delete, but the others (uselessly as the class isn’t even defined) wrap a lot of text. Code:
<div class="text"> <p class="normal-262-0-override"><span class="no-style-override-9"><br/></span></p> <p style="page-break-after:always;"></p><div style="page-break-after:always"></div><p class="normal-262-0-override"><span class="no-style-override-9"><br/></span></p> </div> |
![]() |
![]() |
![]() |
#4 |
Zealot
![]() Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
|
Hmm...seems like there are some of the paras it won’t find either.
I even copied the match text from the ones I can see in the doc, but it still won’t find them. |
![]() |
![]() |
![]() |
#5 |
Hedge Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 802
Karma: 19999999
Join Date: May 2011
Location: UK/Philippines
Device: Kobo Touch, Nook Simple
|
The Sigil forum has a sticky thread for regex examples. As the Editor's version of Regex is different, it may be helpful to have a similar sticky in the Editor forum.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
One thing to watch out for is that the case sensitive check box will override the search criteria. If it is NOT checked then [a-z] will find not only a-z but A-Z as well.
Are you searching inside a table? It seems to have trouble doing that. Also, are you typing things into the search or are you copying and pasting. Copying and pasting, then altering will catch hidden characters, like non-breaking spaces which might be present. |
![]() |
![]() |
![]() |
#7 |
Zealot
![]() Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
|
Case sensitive was not checked. Not searching in a table. I copied and pasted to test just that possibility (hidden chars) but could not find the copied text.
|
![]() |
![]() |
![]() |
#8 |
Zealot
![]() Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
|
Ah...for the paras, the cases where it works are ones where there are no other tags in the paragraph. Is that a clue?
|
![]() |
![]() |
![]() |
#9 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
How about one step at a time? Do the inner ones first and the outer ones after the first are gone.
Also, some symbols like . are used in regex commands. If you suspect they might be, use \ in front of them, like \. |
![]() |
![]() |
![]() |
#10 |
Zealot
![]() Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
|
Some paragraphs will always have tags in them though. I can try though. Either way I’ll get there in the end; it would just be quicker (and require less hunting down stray </span>s) with regex.
|
![]() |
![]() |
![]() |
#11 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
The thing is if you have trouble figuring out how to separate the two, how is a program or regex search supposed to figure it out. The perfect one that will get everything you want always seems to sweep up others too.
My solution is to gear for these ones that sweep everything and search one at a time, ready to hit replace and find, but also ready to go up to the text to fix something that would be overfixed by the regex. It is not so fast, but it is much faster than reading and fixing line by line or faster than repairing the damage of an overactive replace. |
![]() |
![]() |
![]() |
#12 |
Zealot
![]() Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
|
Well, that is what I’ve been doing more or less. Most of the formatting in this book is utterly useless anyway.
I don’t actually have any trouble at all identifying the required sequences of characters; it’s just the suggested regex doesn’t seem to work consistently. The exact sequences of chars copied from the document are not found with regex though they are with a normal search. This leads me to wonder if there is something in the syntax that’s preventing it from working, but since it’s a bit of a black art to me and I don’t even know what the flavour of regex in Calibre is called, I don’t really know what to look for. It doesn’t seem too unreasonable to want to search for [exact sequence A]SOME TEXT[first occurrence of exact sequence B]. |
![]() |
![]() |
![]() |
#13 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
It seems somewhat inconsistent to me too. But my abilities are kind of like yours. There are various cheatsheets that cover it, but it is not something that is easy to understand. So I don't know what I don't know.
If you run the search again, sometimes it seems to work where it did not scoop it up the first time. It seems like they are the same too. |
![]() |
![]() |
![]() |
#14 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,568
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I've found there's really not that much difference between calibre-edit's regex engine and Sigil's. And most of those differences are very "edge case" kind of stuff: searching for unicode code-points, and the like. Mostly, calibre-edit's regex engine just adds a few things that weren't in Sigil's: like allowing variable-length lookbehinds, and matching the start/end of words with \m \M.
If case matters ... then case matters. And if so, the appropriate box needs to be checked. |
![]() |
![]() |
![]() |
#15 |
Zealot
![]() Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
|
Figured out what I wanted was this...
Code:
<p class="normal-262-0-override"><span class="no-style-override-9">(.*)</span></p> <p>\1</p> |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Need help with a regex | mobiuser | Workshop | 15 | 01-19-2014 05:57 PM |
Help with some regex | Chaos_Therum | Library Management | 1 | 12-28-2013 11:20 AM |
Regex | Gunnerp245 | Conversion | 5 | 03-05-2012 04:15 PM |
Help me with regex please. | eVrajka | Library Management | 5 | 08-15-2011 12:17 PM |
Regex | Faster | Sigil | 2 | 04-24-2011 09:08 PM |