Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 12-07-2024, 10:13 AM   #1
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 479
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Replace <br/> in <h1>...<h1/>

I'd like to replace <br/>s with a space when they're between opening and closing h1 tags. However, something simple such as

Code:
<h1>(.*?)<br/>(.*?)<h1/>
<h1>\1 \2<h1/>
doesn't work well since there are some opening and closing h1s that don't have a br between them, so this matches everything until the next br and then closing h1, which could be a lot. I just want to replace a br when it's between an opening and closing h1. In other words find an opening and closing h1 tag, then if there's a br in that range replace it with a space, otherwise do nothing.

How to?

Last edited by foosion; 12-07-2024 at 11:10 AM. Reason: Should have posted <br/> not <br>
foosion is offline   Reply With Quote
Old 12-07-2024, 10:55 AM   #2
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 776
Karma: 1538394
Join Date: Sep 2013
Device: Kobo Forma
Odd. In my Calibre editor that works just fine.

A couple of thoughts, though (could just be typos). First, you should close your <br> tags. IOW, <br/>, not just <br>. Second, are you sure all your <h1></h1> tags will be on the same line? Because, as written, your search string won't find multiple line <h1>s. And, third (possibly related to the problem you're seeing), do you have the "Dot All" box checked at the bottom of the editor screen? With it unchecked, it works as you want for single line <h1>s. With it checked, it'll pick up multiple line <h1>s, but also those <h1>s without a <br/> and stretch the selection to the next </h1>.
enuddleyarbl is offline   Reply With Quote
Advert
Old 12-07-2024, 11:14 AM   #3
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 479
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
<br> was a typo - I meant <br/>

The code I posted finds all of my <h1>s, the problem is it also finds large areas of text not within <h1>...<h1/> if there isn't a <br/> inside. I do have Dot All checked, otherwise it doesn't find the <h1>...<h1/>s; none are single line.

So the question is how to just find <br/> within <h1>...<h1/> with Dot All checked? IOW, when there are <h1>s without a <br/>, how to avoid stretching the selection to the next </h1>?
foosion is offline   Reply With Quote
Old 12-07-2024, 11:37 AM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
<h1/> is also a typo H1 is not a self closing tag.

Code:
<h1>(.*?)\s*<br />\s*(.*?)</h1>
I included 0 or more spaces as part of my search as you do put 1 back in your replace.

FWIW I usually do the reverse. I insert a trailing space after \1.
Code:
\1 <br />\2
for use on my small (not a desktop) screen when chapter titles are more the just Chapter #. The advantage is, the TOC builder works.
You can even resize the second line with a <span class=> and not mess up the TOC inclusion
theducks is offline   Reply With Quote
Old 12-07-2024, 12:16 PM   #5
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 479
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Quote:
Originally Posted by theducks View Post
Code:
<h1>(.*?)\s*<br />\s*(.*?)</h1>
That seems to have the same issue if there's a <h1> </h1> pair without a <br/> inside - it stretches until the next h1.
foosion is offline   Reply With Quote
Advert
Old 12-07-2024, 12:24 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
did you have Dot all checked? That makes it Greedier than you want.
theducks is offline   Reply With Quote
Old 12-07-2024, 12:43 PM   #7
Quoth
Still reading
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 14,016
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
Sometimes a heading tag is for a multi-line heading where a a separate heading tag for each line would result in erroneous auto-generated TOC.

Also a <br /> is often for a heading that is logically two lines or two long for a single line would look strange with an arbitrary position for the word wrap with big font or smaller screen.


So while it's perfectly possible to find only the newlines embedded in a heading and put a space instead it may not be what is really needed.

Though, in an ebook, I'd only ever have <br /> in a heading, if anywhere, as extra space elsewhere is better done with paragraph CSS.
Quoth is offline   Reply With Quote
Old 12-07-2024, 01:45 PM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
I also do it for an attempt at (anti-ugly) control:
Chapter One Hundred Twenty-Seven
I would rathe have, than a arbitrary break because it will not fit on the line.
Chapter
One Hundred Twenty-Seven
theducks is offline   Reply With Quote
Old 12-07-2024, 02:11 PM   #9
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 776
Karma: 1538394
Join Date: Sep 2013
Device: Kobo Forma
I, too, am always (we'll, often) adding <br/>s to my chapter headings just to make sure they fit decently on the page. And, sorry I forgot to mention the ending </h1> tag. I saw it, but forgot.
enuddleyarbl is offline   Reply With Quote
Old 12-07-2024, 02:39 PM   #10
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 479
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Quote:
Originally Posted by theducks View Post
did you have Dot all checked? That makes it Greedier than you want.
If Dot all is unchecked I don't get any hits. With it checked it's greedier than I want.

I'm starting to think I need a function that first finds <h1>(.*?)</h1>, then searches \1 for <br/> and does the replace. I'm just not sure how to do this.
foosion is offline   Reply With Quote
Old 12-07-2024, 02:42 PM   #11
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 479
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Quote:
Originally Posted by Quoth View Post
Though, in an ebook, I'd only ever have <br /> in a heading, if anywhere, as extra space elsewhere is better done with paragraph CSS.
Agreed, but this is a commercial book and it's using <br/> as a line break in many places.
foosion is offline   Reply With Quote
Old 12-07-2024, 02:47 PM   #12
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
I do this kind of search all the time. (eg add a <span around \2 to style it a bit different)
There is something hidden in the text that is throwing the find off.
One thing I found can help: Beautify all HTML. That does get rid of space spaces (or use \s* )
theducks is offline   Reply With Quote
Old 12-07-2024, 02:57 PM   #13
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 479
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
This seems to work as a regex function:

Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    return '<h1>'+match.group(1).replace('<br/>',' ').replace('<br/>',' ').replace('<br/>',' ')+'</h1>'
I did multiple replaces because there are up to three <br/>s in headings.

I used create TOC from headings and it's added ids throughout, including blanks ones, e.g.:

Code:
<h1 id="toc_21">Security measures adopted by Atlantis/Shanghai.</h1><h1 id="toc_22"></h1>
Is this normal?
foosion is offline   Reply With Quote
Old 12-08-2024, 09:05 AM   #14
Quoth
Still reading
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 14,016
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
Quote:
Originally Posted by foosion View Post
Agreed, but this is a commercial book and it's using <br/> as a line break in many places.
Some of or all those need to be replaced
/p> <br /> <p class="existing"
with
/p> <p class="revised"
Quoth is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
how to replace text with Search and Replace with regex on Calibre darrnih ePub 2 04-02-2024 02:10 AM
How can I replace CC 5.4.4.21 with CC 5.4.4.19 ? Pierre-Olivier Calibre Companion 9 12-29-2023 04:43 AM
What to Replace the Sony With MickeyC Which one should I buy? 2 11-13-2014 10:08 AM
save multiple search/replace, or search/replace multiple ebooks user743 Editor 12 04-12-2014 02:38 AM
search and replace - drops blanks in replace ? cybmole Conversion 10 03-13-2011 03:07 AM


All times are GMT -4. The time now is 04:16 PM.


MobileRead.com is a privately owned, operated and funded community.