Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 03-16-2019, 04:21 PM   #1
meghane_e
Connoisseur
meghane_e began at the beginning.
 
Posts: 75
Karma: 10
Join Date: Sep 2016
Device: Kindle
Regex: grabbing <h3><span> tag group

Need help with my regex expression to search <h3.*>(\w.+)</.*></h3> and replace it with <h3>\1</h3>
It was working and caught several of the sections I'm after, but stopped working on the rest. An example it failed on is the blurb below, which looks identical to first few (to me). I'm after the content "3. Escape". Complete blurb I'm looking at:
Quote:
<h3 class="calibre_5"><span class="calibre3"><span class="bold"><a href="http://www.noname.org/forums/story/david-mcleod/thetranslator/3" class="calibre2"><span class="calibre_1"><span class="underline">3. Escape</span></span></a>
</span></span></h3>
It's grabbing all of the following instead of stopping at the first '<' after the content:
Quote:
<h3 class="calibre_5"><span class="calibre3"><span class="bold"><a href="http://www.noname.org/forums/story/david-mcleod/thetranslator/3" class="calibre2"><span class="calibre_1"><span class="underline">3. Escape</span></span></a>
And the Replacement results in this. Note the </span>s and extra </h3>
Quote:
<h3>3. Escape</h3>
</span></span></h3>
Thanks!

Last edited by meghane_e; 03-16-2019 at 04:22 PM. Reason: format
meghane_e is offline   Reply With Quote
Old 03-16-2019, 04:27 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,005
Karma: 57259778
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Why not use Diaps toolbag to delete spans with certain attributes (or naked) first to make things simpler?
theducks is offline   Reply With Quote
Advert
Old 03-16-2019, 06:26 PM   #3
Brett Merkey
Not Quite Dead
Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.
 
Posts: 194
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
The following regex seems to work fine with the code snippet you gave:

Quote:
<h3.*?(\d+\. \w+).*?</h3>
All it does is look for a number, a period, a space and text between the h3 tags — everything else being ignored.

Last edited by Brett Merkey; 03-16-2019 at 06:29 PM.
Brett Merkey is offline   Reply With Quote
Old 03-16-2019, 07:11 PM   #4
meghane_e
Connoisseur
meghane_e began at the beginning.
 
Posts: 75
Karma: 10
Join Date: Sep 2016
Device: Kindle
Quote:
Originally Posted by theducks View Post
Why not use Diaps toolbag to delete spans with certain attributes (or naked) first to make things simpler?
Thanks for quick feedback! Stupid question but if the tool you mention is the Plugins -> Edits Spans and Divs panel, how do I set it to strip nested spans? The ePub was generated from many Docx files (copied from web browser, files not all online anymore) converted to ePub files then merged into one ePub file. The point being it ends up with tons of nested spans and CSS auto-generated by the editor.

Maybe I'm having a more basic problem of what work flow to use here. I'm flip-flopping between whether the Editor is a better way to edit it, or should I write a script/program to transform the HTMZ first. Is there a thread that discusses which work flow methods work better in various conditions?
meghane_e is offline   Reply With Quote
Old 03-16-2019, 07:46 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,005
Karma: 57259778
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
That is what it is called in Calibre (I also use it in Sigil)

Yes, it matches the tag pair for the conditions supplied.
It can be a bit tedious, but IMHO safer to do 1 condition at a time rather than do wild card (it will allow foot shooting )
span
style
calibre\d+ REGEX mode
theducks is offline   Reply With Quote
Advert
Old 03-20-2019, 05:19 PM   #6
meghane_e
Connoisseur
meghane_e began at the beginning.
 
Posts: 75
Karma: 10
Join Date: Sep 2016
Device: Kindle
Hmm, thank you for info. Not sure if I can use it right now, but sure know I'll probably need it later.
meghane_e is offline   Reply With Quote
Old 03-27-2019, 08:45 PM   #7
JoeBloe
Junior Member
JoeBloe began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Sep 2014
Device: Android Cool Reader
A great site for testing Regex is regex101.com.
Try this:
(<h3.*>)(\w[^<]+)(<.+>)(</h3>)
The section [^<]+ tells it to take anything that is NOT a < whereas a . will keep going and take anything until it gets to the </tag> before the </h3>
Of course, you can do what you want with the grouping. I just use it on that site to better see if each section is doing what I want.
JoeBloe is offline   Reply With Quote
Old 03-28-2019, 04:38 AM   #8
meghane_e
Connoisseur
meghane_e began at the beginning.
 
Posts: 75
Karma: 10
Join Date: Sep 2016
Device: Kindle
regex101.com looks like the most helpful test site I've come across! Good layout and usability! Thanks Joe! Still working my problem out though.
meghane_e is offline   Reply With Quote
Old 03-28-2019, 04:33 PM   #9
meghane_e
Connoisseur
meghane_e began at the beginning.
 
Posts: 75
Karma: 10
Join Date: Sep 2016
Device: Kindle
Thanks again everyone! Well, this gets me the content I want, at least on reg101.com and in the Editor:

Given code:
Code:
<h3 class="calibre_5"><span class="calibre3"><span class="bold">
<a href="http://www.noname.org/forums/story/david-mcleod/thetranslator/3" class="calibre2"><span class="calibre_1">
<span class="underline">3. Escape</span></span></a>
</span></span></h3>
The Find expression:
(<h3[^>]+>(<[^>]+>)+)([^<]+)(([^>]+>)+(.*<\/h3>))

Replace with \3

But that seems overly complicated?

Edit:Yay! (<h3.*>)(\w[^<]+)(<.+>)(<\/h3>) does work. There was a hidden newline in the generated code. Once I found it and took it out, it was happy

Last edited by meghane_e; 03-28-2019 at 04:38 PM.
meghane_e is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Span Span Span Sigil cleaning up indesign blackest Sigil 31 12-06-2017 10:16 AM
<Span> tag vs <I> tag Sablerose Editor 22 01-15-2014 02:26 AM
Is there RegEx to <span> ALL CAPS text? phossler Sigil 4 03-10-2013 02:43 PM
Regex and span JSWolf Sigil 7 01-23-2013 06:35 AM
how do I span more than one line with regex BartB Sigil 3 12-11-2011 05:12 PM


All times are GMT -4. The time now is 08:18 PM.


MobileRead.com is a privately owned, operated and funded community.