04-29-2014, 08:48 AM | #1 |
Connoisseur
Posts: 59
Karma: 118112
Join Date: Jul 2013
Device: none
|
Help with poor chapter headings needed
I have a difficulty with a badly formatted epub book.
It has few files, no TOC and no tagged chapter headings. The chapters as text are “*1*” “*2*” etc without the quotes and nothing to hang a TOC on. The code for the headings is Code:
<p class="calibre11"><b class="calibre10"><span lang="EN-US" class="calibre1">*1*</span></b></p> Code:
To find (\*[0-9]+\*) to replace <h2>\1</h2> Code:
<p class="calibre11"><b class="calibre10"><span lang="EN-US" class="calibre1"><h2>*1*</h2></span></b></p> Code:
<h2 class="calibre11"><b class="calibre10"><span class="calibre1" xml:lang="EN-US">*1*</span></b></h2> 1. Edit the formatting of h2 – how is that done? 2. Formulate the regex to include all from <p to /p> and replace it with code like that produced by Sigil Any and all helpful suggestions will be appreciated. The more I learn, the more I realise I don’t know. |
04-29-2014, 09:20 AM | #2 | |
Well trained by Cats
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Why not just change the set of P to a H2 Search: Code:
<p (class="calibre11"><b class="calibre10"><span lang="EN-US" class="calibre1">\*\d+\**</span></b>)</p> Code:
<h2 \1</h2> IMHO the Bold part is necessary as H# default to bold |
|
Advert | |
|
04-29-2014, 09:20 AM | #3 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
It could be easier to try to find the front, change it to a simple <h2> for however many chapters, one at a time, then go through it finding the back and changing them to your </h2>, one at a time. No way to go wrong in case of duplicated formatting, if you do them one at a time, and it shouldn't take over 5 minutes.
|
04-30-2014, 05:14 AM | #4 |
Connoisseur
Posts: 59
Karma: 118112
Join Date: Jul 2013
Device: none
|
Thank you theducks it worked fine on this book.
Thank you mrmikel this will be a better way on another book where the classes are not consistent fr0m heading to heading. All books are quite old and it seems later ebooks have neater and easier to follow formatting. |
04-30-2014, 08:46 AM | #5 | |
Well trained by Cats
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
You will have to look at each case and twist and tweak your exact solution to fit. I disagree with @mrmikel in that I prefer to always do a complete Tag pair before leaving a section (file). 'Leave No broken code in a file before departing, least thy get the terrible pink/grey box later' In 90% of the cases, I ONLY, hand step (no Replace All unless the count all exactly matches what was expected, and even then, I SAVE before and hand step a few replace , LOOK, find, LOOK... before the ALL gets any attention) One thing about modern computers: You can make a LOT of garbage... Fast Last edited by theducks; 04-30-2014 at 08:55 AM. |
|
Advert | |
|
04-30-2014, 09:08 AM | #6 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
I agree that it is better practice to do a section at a time. Easier to catch mistakes and misunderstandings.
It can be better to NOT start with something calibre-ized so you don't have to get rid of all the calibre stuff. If it is in some sort of HTML, it is better to bring it in that way.Calibre's code works, but it complicates things if you have to edit, since it can be hard to understand why calibre added in any particular class. |
05-06-2014, 08:44 AM | #7 |
Connoisseur
Posts: 59
Karma: 118112
Join Date: Jul 2013
Device: none
|
theducks and mrmikel --Thanks again for more useful thoughts.
I was fortunate with the book in question that the "text" used would not appear in any normal context. Also I had looked all through the html pages and knew a search and replace would work. I take on board the suggestion that a full search and replace is potentially a disaster and that a softly, softly approach is more certain. That is now my preferred approach. Thanks again for all the help both on this thread and on many others. |
05-06-2014, 10:00 AM | #8 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
You're welcome. The best way to look through all the html pages is to use the find button for the phrase in question. Even after working on books for a while I still get surprised when suddenly my find phrase takes in something that I forget existed.
I have mentioned another forum that I had been trying to capitalize each ATIS, an acronym for a WWII US Army office, in a book, but lo and behold, I ended up with sATISfaction. |
05-10-2014, 03:51 AM | #9 |
Junior Member
Posts: 6
Karma: 610
Join Date: Aug 2011
Device: Calibre, FBreader(android)
|
Hi all,
I have a similar problem. My chapter headings are "1.", "2." etc. without any heading tags too. And I wanted to create a ToC by giving the headings tag and then using those tags to form the Contents. My problem: I've managed to use regex to capture the headings, but do not know how to replace it, because I still need to keep the "5." intact, and use replace all to change all the 100+ headings. original code: Code:
<p class="calibre2"><span class="bold">5.</span></p> Code:
<p class="calibre2"><span class="bold">\s*[0-9]+.*?</span></p> Code:
<p class="calibre2"><span class="bold"><h3>???</h3></span></p> |
05-10-2014, 06:28 AM | #10 | ||
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Code:
<p class="calibre2"><span class="bold">(\s*[0-9]+\..*?)</span></p> Code:
<p class="calibre2"><span class="bold">(\d+\.)</span></p> Quote:
Code:
<p class="calibre2"><span class="bold"><h3>\1</h3></span></p> Code:
<h3 class="calibre2">\1</h3> |
||
05-10-2014, 08:18 AM | #11 | |
Junior Member
Posts: 6
Karma: 610
Join Date: Aug 2011
Device: Calibre, FBreader(android)
|
Quote:
As for the second, since everything in that book starts with the p class and span tags, I don't think I'll want to risk it and change anything too major. But thanks anyway for an alternative solution. |
|
05-10-2014, 08:31 AM | #12 |
Village idiot
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
|
When the class is specific to the chapters, am I correct in suggesting that you can use (.*) to match everything inside that class, be it digits and letters and...?
If my chapters are like this: 1. My first chapter then I can find it wit (.*). But what do I use in the replace? /1 doesn't do the job. |
05-10-2014, 09:17 AM | #13 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
You mean \1 ?
|
05-10-2014, 09:54 AM | #14 | |
Well trained by Cats
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
All escapes are \ All tags close with / (even self closing, like <br /> ) Unique is the keyword. As long as you always have an exact match. Yes, the non-greedy wild card capture (.+?) can be used BTW bold is no longer needed when changing from P to H# By default: H# tags are already BOLD |
|
05-10-2014, 10:51 AM | #15 |
Village idiot
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
|
I meant, \1, that was a typo.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Invisible chapter headings? | bcamp47 | Sigil | 6 | 03-24-2013 12:38 AM |
How to mark chapter headings | JimLL | Sigil | 107 | 06-17-2012 09:02 AM |
Chapter Headings | Paxman53 | Conversion | 3 | 10-12-2011 12:31 PM |
Why H1 and H2 Chapter Headings? | Ransom | Calibre | 11 | 08-10-2011 04:29 PM |
Help converting chapter headings | p3aul | Conversion | 6 | 04-03-2011 12:56 PM |