Using a TOC to create Chapter headings

meghane_e · 12-07-2025, 12:05 AM

I have an EPUB file that just uses images as the chapters names. I'm trying to condense the file since it's so large because of all the graphics. The TOC is correct and usable.

I've written some Regex-Func functions for simpler stuff but this is harder for me to wrap my brain around.

Here's a sample in the TOC
<p class="toc1"><a href="part0013.html#CCNA1-f66f6b0d51c44ea49012bf2fe61db1ae" class="toc_text"><strong class="calibre1">9. </strong> The Chickens Draw First Blood</a></p>
<p class="toc1"><a href="part0014.html#DB7S1-f66f6b0d51c44ea49012bf2fe61db1ae" class="toc_text"><strong class="calibre1">10. </strong> My Singing Makes Things Worse, and Everyone Is Totally Shocked</a></p>

#part0013.html:
<body class="calibre">
<div class="fullimage" id="DB7S1-f66f6b0d51c44ea49012bf2fe61db1ae"><img alt="" src="../images/00021.jpeg" class="calibre3"/></div>

I can see how to do this by brute-force: unzip the EPUB and using Python directly on the HTML files (I'm capable of that). But surely there are possible other tools available.

Thank you for suggestions or pointers.

[EDIT]
Here's what I have so far (which is missing most of it I know):
[EDIT2]
I'm pretty my code start below is not going in the right direction.

Code:

from bs4 import BeautifulSoup

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):

    soup = BeautifulSoup(html, "html.parser")

    string = soup.a.text.split('.')
    chap_num = string[0].strip()
    chap_title = string[1].strip()
    id = soup.a['href'].split('#')

    # not sure how to go to/check the next item in the TOC

    # not sure how to place the output onto the right page
    return f'<h2>{chap_num} {chap_title}</h2>'

meghane_e · 12-07-2025, 03:36 PM

So... I found this pre-written function change_title_of_page_to_chapter_name.

Is this a step in the right direction?

Code:

# Use expression: <(h[123]) [^<>]* id=['"]([^'"]+)['"][^<>]*>([^<>]+)

from calibre import replace_entities
from calibre.ebooks.oeb.polish.toc import TOC, toc_to_html
from calibre.gui2.tweak_book import current_container
from calibre.ebooks.oeb.base import xml2str

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    if match is None:
        # All matches found, output the resulting Table of Contents.
        # The argument metadata is the metadata of the book being edited
        if 'toc' in data:
            toc = data['toc']
            root = TOC()
            for (file_name, tag_name, anchor, text) in toc:
                parent = root.children[-1] if tag_name == 'h2' and root.children else root
                parent.add(text, file_name, anchor)
            toc = toc_to_html(root, current_container(), 'toc.html', 'Table of Contents for ' + metadata.title, metadata.language)
            print (xml2str(toc))
        else:
            print ('No headings to build ToC from found')
    else:
        # Add an entry corresponding to this match to the Table of Contents
        if 'toc' not in data:
            # The entries are stored in the data object, which will persist
            # for all invocations of this function during a 'Replace All' operation
            data['toc'] = []
        tag_name, anchor, text = match.group(1), replace_entities(match.group(2)), replace_entities(match.group(3))
        data['toc'].append((file_name, tag_name, anchor, text))
        return match.group()  # We don't want to make any actual changes, so return the original matched text

# Ensure that we are called once after the last match is found so we can
# output the ToC
replace.call_after_last_match = True
# Ensure that when running over multiple files, this function is called,
# the files are processed in the order in which they appear in the book
replace.file_order = 'spine'

JSWolf · 12-07-2025, 03:46 PM

You can then do a search.replace regex to replace the graphic chapter headers with text.

meghane_e · 12-07-2025, 04:28 PM

I wish I could understand what you mean lol. So the problem is on the chapter pages there is no text only a html code for a graphic

<div class="fullimage" id="DB7S1-f66f6b0d51c44ea49012bf2fe61db1ae"><img alt="" src="../images/00021.jpeg" class="calibre3"/></div>

The only place in the book that has the text is the TOC (page0003.html) with html like this:

Code:

<p class="toc1"><a href="part0013.html#CCNA1-f66f6b0d51c44ea49012bf2fe61db1ae" class="toc_text"><strong class="calibre1">9. </strong> The Chickens Draw First Blood</a></p>
<p class="toc1"><a href="part0014.html#DB7S1-f66f6b0d51c44ea49012bf2fe61db1ae" class="toc_text"><strong class="calibre1">10. </strong> My Singing Makes Things Worse, and Everyone Is Totally Shocked</a></p>

I can do a search on the following to get the ID

Code:

<div class="fullimage" id="([^"]+)"><img alt="" src="../images/000\d+.jpeg" class="calibre3"/></div>

But I'm not sure how to get the data from the toc.

I definitely want the function to return

Code:

return f'<h2 class="chapter-heading">{new_chapter_title}</h2>'

where the 'new_chapter_title' contains the chapter number and the chapter title

UPDATE:
It seems like this line needs to be adjusted since I'm not using (<h[1|2|3]) in the Search?

Code:

tag_name, anchor, text = match.group(1), replace_entities(match.group(2)), replace_entities(match.group(3))

lomkiri · 12-08-2025, 06:43 AM

Quote:

But I'm not sure how to get the data from the toc.

You have the possibility to make a persistent dict (e.g. mydata) that will survive from one passage of the regex-fucntion to the next passage
You'll have to make 2 different parts in your function: the first one to get the titles from the toc, and the second to make the changes.

Then make 2 passages of the function with different regex on different files :
-- 1st passage (get information from the toc) :

Code:

Displayed file : your toc
find : <p class="toc1"><a href="([^#]+)#([^"]+)" class="toc_text"><strong class="calibre1">(\d+)[^>]+>\s?([^<]+)
scope : current file

You'll get, for each occurrence : :
match[1] -> file name
match[2] -> tag
match[3] -> chap-number
match[4] -> chap title
Store this in you dict mydata, key might be file name or tag, value is a dict, e.g. (tag, chap-num, title)

Then, on the second passage you fill your headers

Code:

find : <div class="fullimage" id="([^"]+)".+/</div>
scope : all text files

The skeleton of your function will be something like this (not tested) :

Code:

mydata = {}    # This dict will survive between 2 passages of the same function
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):

    # first passage: get info and fill mydata
    if file_name == <name of the file holding the toc>  # adapt this:
        mydata[match[1]] = {'tag': match[2], 'num': match[3], 'title': match[4]}
        # you can check the values with a print mydata
        return match[0]
        
    # second passage: replace headers
    if match[1] in mydata:
       chap = mydata[match[1]]
       header =  chap['num'] + ' – ' + chap['title']
       return f'<h1>{header}</h1>'    # adapt this
    else:
       print(f'title not found for file {filename}, tag {match[1]}')
       return match[0]

Doitsu · 12-08-2025, 07:14 AM

@meghane_e you might find KevinH's TOCSaver Sigil plugin helpful. It'll insert hidden heading tags based on an existing NCX TOC. After running the plugin you can then regenerate the TOC with Sigil via Tools > Table of Contents > Generate Table of Contents.

DNSB · 12-08-2025, 06:04 PM

One note on Doitsu's message is that you will need to use Sigil to edit the ePub to run this plugin. OTOH, I've used it several times and it works quite well.

meghane_e · 12-09-2025, 04:48 PM

Y'all are awesome, this is all really useful!! If I can get it to work, I'll post the solution.

I don't use Sigil very often, but will check that out too.

12-07-2025, 12:05 AM	#1
meghane_e Zealot Posts: 126 Karma: 38500 Join Date: Sep 2016 Location: San Jose, CA Device: Kindle moving to Kobo or Boox	Using a TOC to create Chapter headings I have an EPUB file that just uses images as the chapters names. I'm trying to condense the file since it's so large because of all the graphics. The TOC is correct and usable. I've written some Regex-Func functions for simpler stuff but this is harder for me to wrap my brain around. Here's a sample in the TOC <p class="toc1"><a href="part0013.html#CCNA1-f66f6b0d51c44ea49012bf2fe61db1ae" class="toc_text"><strong class="calibre1">9. </strong> The Chickens Draw First Blood</a></p> <p class="toc1"><a href="part0014.html#DB7S1-f66f6b0d51c44ea49012bf2fe61db1ae" class="toc_text"><strong class="calibre1">10. </strong> My Singing Makes Things Worse, and Everyone Is Totally Shocked</a></p> #part0013.html: <body class="calibre"> <div class="fullimage" id="DB7S1-f66f6b0d51c44ea49012bf2fe61db1ae"><img alt="" src="../images/00021.jpeg" class="calibre3"/></div> I can see how to do this by brute-force: unzip the EPUB and using Python directly on the HTML files (I'm capable of that). But surely there are possible other tools available. Thank you for suggestions or pointers. [EDIT] Here's what I have so far (which is missing most of it I know): [EDIT2] I'm pretty my code start below is not going in the right direction. Code: from bs4 import BeautifulSoup def replace(match, number, file_name, metadata, dictionaries, data, functions, args, kwargs): soup = BeautifulSoup(html, "html.parser") string = soup.a.text.split('.') chap_num = string[0].strip() chap_title = string[1].strip() id = soup.a['href'].split('#') # not sure how to go to/check the next item in the TOC # not sure how to place the output onto the right page return f'<h2>{chap_num} {chap_title}</h2>' Last edited by meghane_e; 12-07-2025 at 03:38 PM. Reason: Adding to function as I go*

12-07-2025, 04:28 PM	#4
meghane_e Zealot Posts: 126 Karma: 38500 Join Date: Sep 2016 Location: San Jose, CA Device: Kindle moving to Kobo or Boox	I wish I could understand what you mean lol. So the problem is on the chapter pages there is no text only a html code for a graphic <div class="fullimage" id="DB7S1-f66f6b0d51c44ea49012bf2fe61db1ae"><img alt="" src="../images/00021.jpeg" class="calibre3"/></div> The only place in the book that has the text is the TOC (page0003.html) with html like this: Code: <p class="toc1"><a href="part0013.html#CCNA1-f66f6b0d51c44ea49012bf2fe61db1ae" class="toc_text"><strong class="calibre1">9. </strong> The Chickens Draw First Blood</a></p> <p class="toc1"><a href="part0014.html#DB7S1-f66f6b0d51c44ea49012bf2fe61db1ae" class="toc_text"><strong class="calibre1">10. </strong> My Singing Makes Things Worse, and Everyone Is Totally Shocked</a></p> I can do a search on the following to get the ID Code: <div class="fullimage" id="([^"]+)"><img alt="" src="../images/000\d+.jpeg" class="calibre3"/></div> But I'm not sure how to get the data from the toc. I definitely want the function to return Code: return f'<h2 class="chapter-heading">{new_chapter_title}</h2>' where the 'new_chapter_title' contains the chapter number and the chapter title UPDATE: It seems like this line needs to be adjusted since I'm not using (<h[1\|2\|3]) in the Search? Code: tag_name, anchor, text = match.group(1), replace_entities(match.group(2)), replace_entities(match.group(3)) Last edited by meghane_e; 12-07-2025 at 04:51 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
how to use command line to automatically create ToC from all headings?	Whip	Editor	23	09-05-2024 04:11 PM
create a useable TOC when book uses images for chapter headings	stumped	Sigil	23	06-09-2019 04:12 PM
Help with Chapter seperators (lines underneath chapter headings)	indieauthor83	Sigil	9	06-23-2017 07:01 AM
Issue With Chapter Headings and TOC	yoss15	Kindle Formats	5	02-07-2012 02:54 PM
Managing HTML Link Behavior, From TOC to Chapter Headings	FlooseMan Dave	Calibre	1	04-01-2010 12:55 AM

12-07-2025, 03:46 PM	#3
JSWolf Resident Curmudgeon Posts: 82,149 Karma: 150871427 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	You can then do a search.replace regex to replace the graphic chapter headers with text.

12-08-2025, 07:14 AM	#6
Doitsu Grand Sorcerer Posts: 5,795 Karma: 24088595 Join Date: Dec 2010 Device: Kindle PW2	@meghane_e you might find KevinH's TOCSaver Sigil plugin helpful. It'll insert hidden heading tags based on an existing NCX TOC. After running the plugin you can then regenerate the TOC with Sigil via Tools > Table of Contents > Generate Table of Contents.

12-08-2025, 06:04 PM	#7
DNSB Bibliophagist Posts: 50,475 Karma: 178402706 Join Date: Jul 2010 Location: Vancouver Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos	One note on Doitsu's message is that you will need to use Sigil to edit the ePub to run this plugin. OTOH, I've used it several times and it works quite well.

12-09-2025, 04:48 PM	#8
meghane_e Zealot Posts: 126 Karma: 38500 Join Date: Sep 2016 Location: San Jose, CA Device: Kindle moving to Kobo or Boox	Y'all are awesome, this is all really useful!! If I can get it to work, I'll post the solution. I don't use Sigil very often, but will check that out too.