Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 04-01-2014, 03:17 AM   #1
seanos
Zealot
seanos began at the beginning.
 
seanos's Avatar
 
Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
Regex help anyone?

I had sort of got used to Sigil’s regex but I’m finding myself a bit lost with Calibre’s different version.

The book I’m editing has a lot of redundant tags like...

Code:
<div class="text">CONTENT</div>
...where I’d just like to remove the div or ...

Code:
<p class="normal-262-0-override"><span class="no-style-override-9">CONTENT</span></p>
...which I’d like to be...

Code:
<p>CONTENT</p>
Can any kind soul help out?
seanos is offline   Reply With Quote
Old 04-01-2014, 03:26 AM   #2
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
IF the div or p classes match your examples you can use the following

Search
Code:
<div class="text">([^<]+)</div>
or
Code:
<p class="normal-262-0-override"><span class="no-style-override-9">([^<]+)</span></p>
Replace
Code:
<p>\1</p>
Edit: Will only work if there are no <tags> in the CONTENT portion, if there is we can do alternate searches, by changing the ([^<]+) to (.+?)

Last edited by Perkin; 04-01-2014 at 03:32 AM.
Perkin is offline   Reply With Quote
Advert
Old 04-01-2014, 03:37 AM   #3
seanos
Zealot
seanos began at the beginning.
 
seanos's Avatar
 
Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
Thanks Perkin, the para one works fine but “No matches were found” for the div.

Here’s an example. This one I can just delete, but the others (uselessly as the class isn’t even defined) wrap a lot of text.

Code:
			<div class="text">
				<p class="normal-262-0-override"><span class="no-style-override-9"><br/></span></p>
				<p style="page-break-after:always;"></p><div style="page-break-after:always"></div><p class="normal-262-0-override"><span class="no-style-override-9"><br/></span></p>
			</div>
seanos is offline   Reply With Quote
Old 04-01-2014, 03:44 AM   #4
seanos
Zealot
seanos began at the beginning.
 
seanos's Avatar
 
Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
Hmm...seems like there are some of the paras it won’t find either.

I even copied the match text from the ones I can see in the doc, but it still won’t find them.
seanos is offline   Reply With Quote
Old 04-01-2014, 04:43 AM   #5
Thasaidon
Hedge Wizard
Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.
 
Thasaidon's Avatar
 
Posts: 802
Karma: 19999999
Join Date: May 2011
Location: UK/Philippines
Device: Kobo Touch, Nook Simple
The Sigil forum has a sticky thread for regex examples. As the Editor's version of Regex is different, it may be helpful to have a similar sticky in the Editor forum.
Thasaidon is offline   Reply With Quote
Advert
Old 04-01-2014, 06:21 AM   #6
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
One thing to watch out for is that the case sensitive check box will override the search criteria. If it is NOT checked then [a-z] will find not only a-z but A-Z as well.

Are you searching inside a table? It seems to have trouble doing that.

Also, are you typing things into the search or are you copying and pasting. Copying and pasting, then altering will catch hidden characters, like non-breaking spaces which might be present.
mrmikel is offline   Reply With Quote
Old 04-01-2014, 09:06 AM   #7
seanos
Zealot
seanos began at the beginning.
 
seanos's Avatar
 
Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
Case sensitive was not checked. Not searching in a table. I copied and pasted to test just that possibility (hidden chars) but could not find the copied text.
seanos is offline   Reply With Quote
Old 04-01-2014, 09:22 AM   #8
seanos
Zealot
seanos began at the beginning.
 
seanos's Avatar
 
Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
Ah...for the paras, the cases where it works are ones where there are no other tags in the paragraph. Is that a clue?
seanos is offline   Reply With Quote
Old 04-01-2014, 10:02 AM   #9
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
How about one step at a time? Do the inner ones first and the outer ones after the first are gone.

Also, some symbols like . are used in regex commands. If you suspect they might be, use \ in front of them, like \.
mrmikel is offline   Reply With Quote
Old 04-01-2014, 10:06 AM   #10
seanos
Zealot
seanos began at the beginning.
 
seanos's Avatar
 
Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
Some paragraphs will always have tags in them though. I can try though. Either way I’ll get there in the end; it would just be quicker (and require less hunting down stray </span>s) with regex.
seanos is offline   Reply With Quote
Old 04-01-2014, 11:00 AM   #11
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
The thing is if you have trouble figuring out how to separate the two, how is a program or regex search supposed to figure it out. The perfect one that will get everything you want always seems to sweep up others too.

My solution is to gear for these ones that sweep everything and search one at a time, ready to hit replace and find, but also ready to go up to the text to fix something that would be overfixed by the regex. It is not so fast, but it is much faster than reading and fixing line by line or faster than repairing the damage of an overactive replace.
mrmikel is offline   Reply With Quote
Old 04-01-2014, 11:12 AM   #12
seanos
Zealot
seanos began at the beginning.
 
seanos's Avatar
 
Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
Well, that is what I’ve been doing more or less. Most of the formatting in this book is utterly useless anyway.

I don’t actually have any trouble at all identifying the required sequences of characters; it’s just the suggested regex doesn’t seem to work consistently. The exact sequences of chars copied from the document are not found with regex though they are with a normal search. This leads me to wonder if there is something in the syntax that’s preventing it from working, but since it’s a bit of a black art to me and I don’t even know what the flavour of regex in Calibre is called, I don’t really know what to look for.

It doesn’t seem too unreasonable to want to search for [exact sequence A]SOME TEXT[first occurrence of exact sequence B].
seanos is offline   Reply With Quote
Old 04-01-2014, 11:42 AM   #13
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
It seems somewhat inconsistent to me too. But my abilities are kind of like yours. There are various cheatsheets that cover it, but it is not something that is easy to understand. So I don't know what I don't know.

If you run the search again, sometimes it seems to work where it did not scoop it up the first time. It seems like they are the same too.
mrmikel is offline   Reply With Quote
Old 04-01-2014, 12:17 PM   #14
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,856
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I've found there's really not that much difference between calibre-edit's regex engine and Sigil's. And most of those differences are very "edge case" kind of stuff: searching for unicode code-points, and the like. Mostly, calibre-edit's regex engine just adds a few things that weren't in Sigil's: like allowing variable-length lookbehinds, and matching the start/end of words with \m \M.

If case matters ... then case matters. And if so, the appropriate box needs to be checked.
DiapDealer is online now   Reply With Quote
Old 04-01-2014, 11:21 PM   #15
seanos
Zealot
seanos began at the beginning.
 
seanos's Avatar
 
Posts: 104
Karma: 12
Join Date: Apr 2010
Location: Melbourne, Australia
Device: Kobo Sage, Kobo Aura H2O, LG V20
Figured out what I wanted was this...

Code:
<p class="normal-262-0-override"><span class="no-style-override-9">(.*)</span></p>

<p>\1</p>
seanos is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help with a regex mobiuser Workshop 15 01-19-2014 05:57 PM
Help with some regex Chaos_Therum Library Management 1 12-28-2013 11:20 AM
Regex Gunnerp245 Conversion 5 03-05-2012 04:15 PM
Help me with regex please. eVrajka Library Management 5 08-15-2011 12:17 PM
Regex Faster Sigil 2 04-24-2011 09:08 PM


All times are GMT -4. The time now is 05:56 AM.


MobileRead.com is a privately owned, operated and funded community.