Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 02-07-2021, 03:45 PM   #1
413Michele
Enthusiast
413Michele began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Jan 2021
Location: Italy
Device: Kobo Libra 2, Kindle Paperwhite (1st gen)
How to combine chapter titles and subtitles

Hello, I have a problem with chapter detection in the calibre editor.

I have a book in which each chapter is defined like this:
Code:
<h2>Chapter XX<h2>
<h3>Title of the Chapter<h3>
I can't for the life of me find a way to have both tags combined in a unique chapter element in the ToC. I tried some XPath functions and operators with no success, the best I can manage is to have them indented under the "chapter nn" element.

Do you have any idea how to do this?

Thank you
413Michele is offline   Reply With Quote
Old 02-07-2021, 03:57 PM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,998
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Save the eBook so you have it as is. Then do a regex search (check the dot all box on the Mode: line) for....

Code:
<h2>(.*)<h2>.<h3>(.*)<h3>
and replace with...

Code:
<h2>\1: \2<h2>
Then generate the ToC. Once that's done, copy the contents of the NCX ToC and close the eBook without saving it. Then edit the eBook again and paste the NCX you copied over the one that's there. Done.

Last edited by JSWolf; 02-07-2021 at 04:05 PM.
JSWolf is offline   Reply With Quote
Advert
Old 02-07-2021, 04:18 PM   #3
413Michele
Enthusiast
413Michele began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Jan 2021
Location: Italy
Device: Kobo Libra 2, Kindle Paperwhite (1st gen)
It didn't work, but that could actually be my fault, as the code I put in the first post was a simplified example and is not exactly the same. I'll put an actual extract:
Code:
<h2 id="sigil_toc_id_1">CAPITOLO I.</h2>

  <h3 id="sigil_toc_id_2">Come Candido è allevato in un bel castello e come n'è cacciato via.</h3>
Edit: To be more precise, your code found no matches
413Michele is offline   Reply With Quote
Old 02-07-2021, 04:52 PM   #4
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,461
Karma: 145525534
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by 413Michele View Post
It didn't work, but that could actually be my fault, as the code I put in the first post was a simplified example and is not exactly the same. I'll put an actual extract:
Code:
<h2 id="sigil_toc_id_1">CAPITOLO I.</h2>

  <h3 id="sigil_toc_id_2">Come Candido è allevato in un bel castello e come n'è cacciato via.</h3>
Edit: To be more precise, your code found no matches
Find:
<h2 (.*?)>(.*?)</h2>{add EOL and spaces here}<h3 (.*?)>(.*?)</h3>

Replace:
<h2 title="\2: \4">\2<br />\4</h2>
DNSB is online now   Reply With Quote
Old 02-07-2021, 05:20 PM   #5
413Michele
Enthusiast
413Michele began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Jan 2021
Location: Italy
Device: Kobo Libra 2, Kindle Paperwhite (1st gen)
Quote:
Originally Posted by DNSB View Post
Find:
<h2 (.*?)>(.*?)</h2>{add EOL and spaces here}<h3 (.*?)>(.*?)</h3>

Replace:
<h2 title="\2: \4">\2<br />\4</h2>
I can't find anything, sorry but I don't know much about regex, are \r and \s the sequences I should be using?
413Michele is offline   Reply With Quote
Advert
Old 02-07-2021, 06:13 PM   #6
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,461
Karma: 145525534
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by 413Michele View Post
I can't find anything, sorry but I don't know much about regex, are \r and \s the sequences I should be using?
I was using Sigil when I tested that and simple copy/pasted from </h2> to <h3 into the from field.
DNSB is online now   Reply With Quote
Old 02-07-2021, 06:48 PM   #7
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
What I frequently do for this is to change to styled paragraphs and a separate heading for the ToC that is not displayed. So, I would have:
Code:
<h2 class="hidden">Chapter XX: Title of the Chapter</h2>
<p class="chapterNumber">Chapter XX</p>
<p class="chapterTitle">Title of the Chapter</p>
The stylesheet would need to appropriate classes to get the desired look. Including:

Code:
.hidden {
     display: none
}
An alternative is:
Code:
<h2>Chapter XX <br/><span class="chapterSubtitle">Title of the Chapter<h2>
That has the advantage of only having the actual title once. But, I don't like that I need a space before the br tag for when calibre generates the ToC. It upsets the centring of the heading (someday I'll get around to looking at the code that does this, as I think the br should be replaced by a space). The first one is good is if you want the ToC to be different to the what is displayed. Such as might not want the word "Chapter" in the actual ToC and you can put a colon or dash between the parts. It is also good if you are using any graphics in the chapter title.

The regex search I would use is:

Code:
<h2>(.*?)</h2>\s*<h3>(.*?)</h3>
The replace for the first version is:
Code:
<h2 class="hidden">\1: \2</h2><p class="chapterNumber">\1</p><p class="chapterTitle">\2</p>
The replace for the second version is:

Code:
<h2>\1 <br/><span class="chapterSubtitle">\2<h2>
I tested the above using https://regex101.com/ not the editor. If you have classes or identifiers in the heading tags, the regex needs to accommodate them. The versions that @DNSB posted come close. But, calibre doesn't use the "title" attribute for the ToC (or didn't the last time I tried).

And I'll go on record: I think that the method that @JSWolf suggested is a really, really bad idea.
davidfor is offline   Reply With Quote
Old 02-07-2021, 07:31 PM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,998
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Search and replace out the IDs in <h2> and <h3>. Also search/replace out all blank lines and any space in front of the <h2> and then my search/replace will work.
JSWolf is offline   Reply With Quote
Old 02-08-2021, 09:42 AM   #9
413Michele
Enthusiast
413Michele began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Jan 2021
Location: Italy
Device: Kobo Libra 2, Kindle Paperwhite (1st gen)
First of all, thank you everyone for your help, I feel like I'll never learn how to properly use regex without a starting expression to edit, it's too broad and confusing for me

I found it odd that there wasn't a thread already of someone with this problem, so I'm trying to be detailed in case someone needs this in the future.

In the end I took the first idea by Davidfor and modified it a bit to keep as much of the original tagging as possible. Here's the string I used to find (I added the part to detect the ids of the tags):
Code:
<h2 id="sigil_toc_id_\d{1,2}">(.*?)</h2>\s*<h3 id="sigil_toc_id_\d{1,2}">(.*?)</h3>
And here's the string to replace with:
Code:
<h2 class="hidden">\1 \2</h2><h2 class="ChapterTitle">\1</h2><h3 class="ChapterSubtitle">\2</h3>
After this I used the ToC generation tool for multilevel elements with XPath:
  • h1 for the first level
  • h2 with class=hidden for the second level
and it worked like a charm!
413Michele is offline   Reply With Quote
Old 02-08-2021, 09:59 AM   #10
413Michele
Enthusiast
413Michele began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Jan 2021
Location: Italy
Device: Kobo Libra 2, Kindle Paperwhite (1st gen)
Ok, now it seems Calibre Viewer doesn't like the new ToC, as those links to the hidden elements don't work. I tried two other viewers (Adobe DE and Google Play Books) and they work without a problem, strange..
413Michele is offline   Reply With Quote
Old 02-08-2021, 09:19 PM   #11
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by 413Michele View Post
First of all, thank you everyone for your help, I feel like I'll never learn how to properly use regex without a starting expression to edit, it's too broad and confusing for me

I found it odd that there wasn't a thread already of someone with this problem, so I'm trying to be detailed in case someone needs this in the future.

In the end I took the first idea by Davidfor and modified it a bit to keep as much of the original tagging as possible. Here's the string I used to find (I added the part to detect the ids of the tags):
Code:
<h2 id="sigil_toc_id_\d{1,2}">(.*?)</h2>\s*<h3 id="sigil_toc_id_\d{1,2}">(.*?)</h3>
And here's the string to replace with:
Code:
<h2 class="hidden">\1 \2</h2><h2 class="ChapterTitle">\1</h2><h3 class="ChapterSubtitle">\2</h3>
After this I used the ToC generation tool for multilevel elements with XPath:
  • h1 for the first level
  • h2 with class=hidden for the second level
and it worked like a charm!
Shouldn't the replace string have "p" instead the second "h2" and the "h3"? Or have you just copied the wrong version?

Edit:

The ToC generation is using a the h2 and the hidden class for the second level. It looks like it should work. Can you show use the generated NCX code? And the actual code at the top of the matching chapter. That might give a hint as to what is wrong.

And that is an interesting idea. What I do is easy if the chapter heading already use well defined classes. But, if they are bare heading tags, I have to add classes. With that, I could rely on the existing stylesheet a bit more.

Last edited by davidfor; 02-08-2021 at 09:25 PM.
davidfor is offline   Reply With Quote
Old 02-09-2021, 05:52 AM   #12
413Michele
Enthusiast
413Michele began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Jan 2021
Location: Italy
Device: Kobo Libra 2, Kindle Paperwhite (1st gen)
Quote:
Originally Posted by davidfor View Post
The ToC generation is using a the h2 and the hidden class for the second level. It looks like it should work. Can you show use the generated NCX code? And the actual code at the top of the matching chapter. That might give a hint as to what is wrong.
Yeah sure, here's the NCX for Chapter 1:
Code:
<navPoint id="num_5" playOrder="5">
        <navLabel>
          <text>CAPITOLO I. Come Candido è allevato in un bel castello e come n'è cacciato via.</text>
        </navLabel>
        <content src="Text/Section0004.xhtml#toc_1"/>
      </navPoint>
And here the corresponding part in the file:
Code:
<h2 class="hidden" id="toc_1">CAPITOLO I. Come Candido è allevato in un bel castello e come n'è cacciato via.</h2>
I'll share a finding: in the calibre book Editor, clicking on the index entry highlights the right line in the editor view, but doesn't go to the right position in the preview (it only opens the file at the beginning).

That suggests me that the problem is on how calibre manages the display: none rule, as the link works on the source html, but not on the resulting file. I might be wrong though.

Quote:
And that is an interesting idea. What I do is easy if the chapter heading already use well defined classes. But, if they are bare heading tags, I have to add classes. With that, I could rely on the existing stylesheet a bit more.
Sorry, what are you referring to in particular? I'm not sure
413Michele is offline   Reply With Quote
Old 02-09-2021, 06:32 AM   #13
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,998
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by 413Michele View Post
Yeah sure, here's the NCX for Chapter 1:
Code:
<navPoint id="num_5" playOrder="5">
        <navLabel>
          <text>CAPITOLO I. Come Candido è allevato in un bel castello e come n'è cacciato via.</text>
        </navLabel>
        <content src="Text/Section0004.xhtml#toc_1"/>
      </navPoint>
And here the corresponding part in the file:
Code:
<h2 class="hidden" id="toc_1">CAPITOLO I. Come Candido è allevato in un bel castello e come n'è cacciato via.</h2>
You do not need #toc_1" because all it does is slow down using the ToC. With the unnecessary ID, the code has to load the chapter and then find the ID before displaying the chapter instead of just loading and displaying the chapter.

Also, it would be easier to find what's in what file if you renamed your files. So if Section0004.xhtml is chapter one, name it chapter01.xhtml.
JSWolf is offline   Reply With Quote
Old 02-09-2021, 06:53 AM   #14
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by 413Michele View Post
Yeah sure, here's the NCX for Chapter 1:
Code:
<navPoint id="num_5" playOrder="5">
        <navLabel>
          <text>CAPITOLO I. Come Candido è allevato in un bel castello e come n'è cacciato via.</text>
        </navLabel>
        <content src="Text/Section0004.xhtml#toc_1"/>
      </navPoint>
And here the corresponding part in the file:
Code:
<h2 class="hidden" id="toc_1">CAPITOLO I. Come Candido è allevato in un bel castello e come n'è cacciato via.</h2>
I'll share a finding: in the calibre book Editor, clicking on the index entry highlights the right line in the editor view, but doesn't go to the right position in the preview (it only opens the file at the beginning).

That suggests me that the problem is on how calibre manages the display: none rule, as the link works on the source html, but not on the resulting file. I might be wrong though.
Nothing there looks wrong. And a couple of tests I have done seem to be fine. I think we need to see more code. What is around the heading? Is it at the top of the file or somewhere further down? The fact that calibre added an id means it isn't the first of that heading level in the file.

If the book has copyright, you can use the ScrambleBook plugin to produce a version that can be posted here.
Quote:
Sorry, what are you referring to in particular? I'm not sure
If the existing code is '<h2 class="chapterClass">', then changing the tag to a "p" will generally keep the desired formatting. But, if it is just '<h2>', I have to change to '<p class="chapterClass">' and create a class 'chapterClass' that displays the text correctly. With your version, there isn't any change needed or new classes.
davidfor is offline   Reply With Quote
Old 02-09-2021, 06:56 AM   #15
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by JSWolf View Post
You do not need #toc_1" because all it does is slow down using the ToC. With the unnecessary ID, the code has to load the chapter and then find the ID before displaying the chapter instead of just loading and displaying the chapter.
That is impossible for you to know. If the heading is at top of the file, the id is not needed. If it is further down, and there are multiple tags of the same level, it is needed. And, it was added by calibre during the ToC generation.
Quote:
Also, it would be easier to find what's in what file if you renamed your files. So if Section0004.xhtml is chapter one, name it chapter01.xhtml.
That might be a good idea, but, it has nothing to do with the problem. And it might already match how the book is laid out.
davidfor is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Where do displayed chapter titles come from? JJ Johnson ePub 11 10-17-2020 07:15 PM
Chapter #s or Chapter Titles? bmcox Writers' Corner 33 02-01-2013 07:03 AM
Titles, subtitles & alternative "display" titles jigme ePub 2 08-31-2011 05:19 PM
Ebook chapter titles: with or without chapter number? amoroso Writers' Corner 16 06-14-2011 06:35 AM


All times are GMT -4. The time now is 01:18 AM.


MobileRead.com is a privately owned, operated and funded community.