Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 04-29-2014, 08:48 AM   #1
cager
Connoisseur
cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.
 
Posts: 59
Karma: 118112
Join Date: Jul 2013
Device: none
Help with poor chapter headings needed

I have a difficulty with a badly formatted epub book.
It has few files, no TOC and no tagged chapter headings.
The chapters as text are “*1*” “*2*” etc without the quotes and nothing to hang a TOC on.
The code for the headings is
Code:
<p class="calibre11"><b class="calibre10"><span lang="EN-US" class="calibre1">*1*</span></b></p>
I have managed to devise a regex to find all the chapter headings and tag as h2 now the problems arise.
Code:
To find  (\*[0-9]+\*)  to replace <h2>\1</h2>
In their natural state the headings are centred but using the calibre editor they become left aligned because the h2 tags are placed around the text only. This is what I get:
Code:
<p class="calibre11"><b class="calibre10"><span lang="EN-US" class="calibre1"><h2>*1*</h2></span></b></p>
If I manually find the headings and use the drop down list of tags I get the same code. If, however, I use Sigil and manually find the heading and use the h2 button the heading stays centred and sized as original and the code I get is:
Code:
<h2 class="calibre11"><b class="calibre10"><span class="calibre1" xml:lang="EN-US">*1*</span></b></h2>
I can see two possible solutions to keep the headings centred and use calibre editor. Neither of which do I know how to do.
1. Edit the formatting of h2 – how is that done?
2. Formulate the regex to include all from <p to /p> and replace it with code like that produced by Sigil
Any and all helpful suggestions will be appreciated.
The more I learn, the more I realise I don’t know.
cager is offline   Reply With Quote
Old 04-29-2014, 09:20 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by cager View Post
This is what I get:
Code:
<p class="calibre11"><b class="calibre10"><span lang="EN-US" class="calibre1"><h2>*1*</h2></span></b></p>
H# tags can not be inside a P tag

Why not just change the set of P to a H2

Search:
Code:
<p (class="calibre11"><b class="calibre10"><span lang="EN-US" class="calibre1">\*\d+\**</span></b>)</p>
Replace:
Code:
<h2 \1</h2>
Note the location of the capture (

IMHO the Bold part is necessary as H# default to bold
theducks is offline   Reply With Quote
Advert
Old 04-29-2014, 09:20 AM   #3
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
It could be easier to try to find the front, change it to a simple <h2> for however many chapters, one at a time, then go through it finding the back and changing them to your </h2>, one at a time. No way to go wrong in case of duplicated formatting, if you do them one at a time, and it shouldn't take over 5 minutes.
mrmikel is offline   Reply With Quote
Old 04-30-2014, 05:14 AM   #4
cager
Connoisseur
cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.
 
Posts: 59
Karma: 118112
Join Date: Jul 2013
Device: none
Thank you theducks it worked fine on this book.

Thank you mrmikel this will be a better way on another book where the classes are not consistent fr0m heading to heading.

All books are quite old and it seems later ebooks have neater and easier to follow formatting.
cager is offline   Reply With Quote
Old 04-30-2014, 08:46 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by cager View Post
Thank you theducks it worked fine on this book.

Thank you mrmikel this will be a better way on another book where the classes are not consistent fr0m heading to heading.

All books are quite old and it seems later ebooks have neater and easier to follow formatting.
When doing this kind of change, remember: 'Every book is unique'

You will have to look at each case and twist and tweak your exact solution to fit.

I disagree with @mrmikel in that I prefer to always do a complete Tag pair before leaving a section (file).
'Leave No broken code in a file before departing, least thy get the terrible pink/grey box later'

In 90% of the cases, I ONLY, hand step (no Replace All unless the count all exactly matches what was expected, and even then, I SAVE before and hand step a few replace , LOOK, find, LOOK... before the ALL gets any attention)
One thing about modern computers: You can make a LOT of garbage... Fast

Last edited by theducks; 04-30-2014 at 08:55 AM.
theducks is offline   Reply With Quote
Advert
Old 04-30-2014, 09:08 AM   #6
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
I agree that it is better practice to do a section at a time. Easier to catch mistakes and misunderstandings.

It can be better to NOT start with something calibre-ized so you don't have to get rid of all the calibre stuff. If it is in some sort of HTML, it is better to bring it in that way.Calibre's code works, but it complicates things if you have to edit, since it can be hard to understand why calibre added in any particular class.
mrmikel is offline   Reply With Quote
Old 05-06-2014, 08:44 AM   #7
cager
Connoisseur
cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.cager 's ceiling is 100% spider-free.
 
Posts: 59
Karma: 118112
Join Date: Jul 2013
Device: none
theducks and mrmikel --Thanks again for more useful thoughts.

I was fortunate with the book in question that the "text" used would not appear in any normal context. Also I had looked all through the html pages and knew a search and replace would work.

I take on board the suggestion that a full search and replace is potentially a disaster and that a softly, softly approach is more certain. That is now my preferred approach.

Thanks again for all the help both on this thread and on many others.
cager is offline   Reply With Quote
Old 05-06-2014, 10:00 AM   #8
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
You're welcome. The best way to look through all the html pages is to use the find button for the phrase in question. Even after working on books for a while I still get surprised when suddenly my find phrase takes in something that I forget existed.

I have mentioned another forum that I had been trying to capitalize each ATIS, an acronym for a WWII US Army office, in a book, but lo and behold, I ended up with sATISfaction.
mrmikel is offline   Reply With Quote
Old 05-10-2014, 03:51 AM   #9
toothpicz
Junior Member
toothpicz will become famous soon enoughtoothpicz will become famous soon enoughtoothpicz will become famous soon enoughtoothpicz will become famous soon enoughtoothpicz will become famous soon enoughtoothpicz will become famous soon enough
 
toothpicz's Avatar
 
Posts: 6
Karma: 610
Join Date: Aug 2011
Device: Calibre, FBreader(android)
Hi all,

I have a similar problem. My chapter headings are "1.", "2." etc. without any heading tags too. And I wanted to create a ToC by giving the headings tag and then using those tags to form the Contents.

My problem:
I've managed to use regex to capture the headings, but do not know how to replace it, because I still need to keep the "5." intact, and use replace all to change all the 100+ headings.

original code:
Code:
<p class="calibre2"><span class="bold">5.</span></p>
My regex to find:
Code:
<p class="calibre2"><span class="bold">\s*[0-9]+.*?</span></p>
My regex to replace:
Code:
<p class="calibre2"><span class="bold"><h3>???</h3></span></p>
Where ??? is a mystery to me. Please help!
toothpicz is offline   Reply With Quote
Old 05-10-2014, 06:28 AM   #10
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by toothpicz View Post
original code:
Code:
<p class="calibre2"><span class="bold">5.</span></p>
My regex to find:
Code:
<p class="calibre2"><span class="bold">\s*[0-9]+.*?</span></p>
Try:
Code:
<p class="calibre2"><span class="bold">(\s*[0-9]+\..*?)</span></p>
But if the sample is accurate, I would use:
Code:
<p class="calibre2"><span class="bold">(\d+\.)</span></p>
Quote:
My regex to replace:
Code:
<p class="calibre2"><span class="bold"><h3>???</h3></span></p>
The replace is:
Code:
<p class="calibre2"><span class="bold"><h3>\1</h3></span></p>
But, I am pretty sure you can't have a heading within a paragraph. What I would use is:
Code:
<h3 class="calibre2">\1</h3>
I would check that the class "calibre2" didn't remove the bold from the usual heading definition.
davidfor is offline   Reply With Quote
Old 05-10-2014, 08:18 AM   #11
toothpicz
Junior Member
toothpicz will become famous soon enoughtoothpicz will become famous soon enoughtoothpicz will become famous soon enoughtoothpicz will become famous soon enoughtoothpicz will become famous soon enoughtoothpicz will become famous soon enough
 
toothpicz's Avatar
 
Posts: 6
Karma: 610
Join Date: Aug 2011
Device: Calibre, FBreader(android)
Quote:
The replace is:
Code:
<p class="calibre2"><span class="bold"><h3>\1</h3></span></p>
But, I am pretty sure you can't have a heading within a paragraph. What I would use is:
Code:
<h3 class="calibre2">\1</h3>
I would check that the class "calibre2" didn't remove the bold from the usual heading definition.
I'm using the first, and it works. Thanks!
As for the second, since everything in that book starts with the p class and span tags, I don't think I'll want to risk it and change anything too major. But thanks anyway for an alternative solution.
toothpicz is offline   Reply With Quote
Old 05-10-2014, 08:31 AM   #12
JLius
Village idiot
JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.
 
JLius's Avatar
 
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
When the class is specific to the chapters, am I correct in suggesting that you can use (.*) to match everything inside that class, be it digits and letters and...?

If my chapters are like this: 1. My first chapter
then I can find it wit (.*). But what do I use in the replace? /1 doesn't do the job.
JLius is offline   Reply With Quote
Old 05-10-2014, 09:17 AM   #13
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
You mean \1 ?
mrmikel is offline   Reply With Quote
Old 05-10-2014, 09:54 AM   #14
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by JLius View Post
When the class is specific to the chapters, am I correct in suggesting that you can use (.*) to match everything inside that class, be it digits and letters and...?

If my chapters are like this: 1. My first chapter
then I can find it wit (.*). But what do I use in the replace? /1 doesn't do the job.
Others have pointed out : \1 not /1
All escapes are \
All tags close with / (even self closing, like <br /> )


Unique is the keyword. As long as you always have an exact match.

Yes, the non-greedy wild card capture (.+?) can be used

BTW bold is no longer needed when changing from P to H#
By default: H# tags are already BOLD
theducks is offline   Reply With Quote
Old 05-10-2014, 10:51 AM   #15
JLius
Village idiot
JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.
 
JLius's Avatar
 
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
I meant, \1, that was a typo.
JLius is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Invisible chapter headings? bcamp47 Sigil 6 03-24-2013 12:38 AM
How to mark chapter headings JimLL Sigil 107 06-17-2012 09:02 AM
Chapter Headings Paxman53 Conversion 3 10-12-2011 12:31 PM
Why H1 and H2 Chapter Headings? Ransom Calibre 11 08-10-2011 04:29 PM
Help converting chapter headings p3aul Conversion 6 04-03-2011 12:56 PM


All times are GMT -4. The time now is 02:48 PM.


MobileRead.com is a privately owned, operated and funded community.