Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-19-2025, 09:58 AM   #1
NL77
Junior Member
NL77 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2025
Device: android
Incremental numbers instead of page numbers in the Index

Hi,
I have several Indesign exports in which the authors’ names in the Index refer to page numbers.
Like this
First Author 18, 25.
Second Author 123, 259, 368. etc.
Code:
<p class="inp">First Author <a href="epi1.xhtml#idx104">18</a>, <a href="epi2.xhtml#idx624">23</a></p>
<p class="inp">Second Author <a href="epi1.xhtml#idx057">123</a>, <a href="epi1.xhtml#idx178">259</a>, <a href="epi1.xhtml#idx241">368</a></p>
I need to replace these page numbers in the e-book with ascending numbers.
Like this:
First Author 1, 2.
Second Author 1, 2, 3. etc.

Code:
<p class="inp">First Author <a href="epi1.xhtml#idx104">1</a>, <a href="epi2.xhtml#idx624">2</a></p>
<p class="inp">Second Autor <a href="epi1.xhtml#idx057">1</a>, <a href="epi1.xhtml#idx178">2</a>, <a href="epi1.xhtml#idx241">3</a></p>
I wrote a simple but working Python script which does the job:
It looks like this:

Code:
import re

i = 0

def IncrementalNumbers(m):
    global i
    i+=1
    return str(i)

PageNumbers = r'(\d+)(?=</a>)'

with open("index.xhtml", 'r') as fp, open("index_renumbered.xhtml","w") as out:
    # read only one line of the file and apply the transformations
    for line in fp:
        i = 0
        l = re.sub(PageNumbers, IncrementalNumbers, line)
        out.write(l)
It would be much faster if I could use this as a plugin, but unfortunately, I have never written a plugin before, and I don't have enough knowledge for it.
My first attempt was only half successful, because the plugin counts globally, not line by line.
Code:
import re
import sys
import sigil_bs4
from bs4 import BeautifulSoup

text_type = str

i = 0

def IncrementalNumbers(m):
    global i
    i+=1
    return str(i)

PageNumbers = r'(\d+)(?=</a>)'
#RefSymbol = '←'

def run(bk):
    for (id, href) in bk.text_iter():
        print('Start %s:' % href)
        html = bk.readfile(id)
        soup = sigil_bs4.BeautifulSoup(html)

    html_orig = html

    html = re.sub(PageNumbers, IncrementalNumbers, html)


    if not html == html_orig:
        print("Modified File --> ", id)
        bk.writefile(id, html)

    return 0


def main():
    print("I reached main when I should not have\n")
    return -1

if __name__ == "__main__":
    sys.exit(main())
This is the result: Second Author 3, 4, 5 instead of Second Author 1, 2, 3
Code:
<p class="inp">First Author <a href="epi1.xhtml#idx104">1</a>, <a href="epi2.xhtml#idx624">2</a></p>
<p class="inp">Second Autor <a href="epi1.xhtml#idx057">1</a>, <a href="epi1.xhtml#idx178">2</a>, <a href="epi1.xhtml#idx241">3</a></p>
Can someone help me to define the lines in the plugin?
Thank you
NL77 is offline   Reply With Quote
Old 03-19-2025, 11:36 AM   #2
Haudek
Member
Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.
 
Posts: 24
Karma: 111614
Join Date: Mar 2025
Location: Poland
Device: Kindle Voyage
Try this.
Spoiler:

Code:
import re
from sigil_bs4 import BeautifulSoup

def run(bk):
    PageNumbers = r'(\d+)(?=</a>)'

    for (id, href) in bk.text_iter():
        print('Processing:', href)
        html = bk.readfile(id)
        soup = BeautifulSoup(html, 'html.parser')

        modified = False

        for p in soup.find_all('p', class_='inp'):
            i = 0  # HERE RESET

            def IncrementalNumbers(m):
                nonlocal i
                i += 1
                return str(i)

            new_html = re.sub(PageNumbers, IncrementalNumbers, str(p), count=0)

            if new_html != str(p):
                modified = True
                p.replace_with(BeautifulSoup(new_html, 'html.parser'))

        if modified:
            print("Modified File -->", id)
            bk.writefile(id, str(soup))

    return 0

def main():
    print("I reached main when I should not have\n")
    return -1

if __name__ == "__main__":
    sys.exit(main())


Look at line with "RESET HERE". Here we reset the counter and each paragraph is counted separately.
Haudek is offline   Reply With Quote
Old 03-19-2025, 01:24 PM   #3
NL77
Junior Member
NL77 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2025
Device: android
Thank you, it works beautifully, I am very grateful.

I don‘t know how common this problem is, but I‘m working on a number of textbooks with indexes, and you‘ve saved me precious minutes.
NL77 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to remove repeated incremental numbers in books pinky62 Library Management 3 12-12-2022 03:28 PM
Index: replace generic links with page numbers del.libro PDF 1 02-28-2021 08:20 AM
Clara HD Doubt with page numbers in kepub index Fenrag Kobo Reader 7 05-15-2020 04:34 AM
Kindle (AZW3/MOBI) ebooks with "real page numbers" to PDF with same page numbers? abvgd Conversion 2 05-24-2013 01:24 PM
Is there a hack for displaying page numbers rather than location numbers? nesler Kindle Developer's Corner 16 02-15-2011 12:00 AM


All times are GMT -4. The time now is 11:32 PM.


MobileRead.com is a privately owned, operated and funded community.