Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-19-2025, 09:58 AM   #1
NL77
Junior Member
NL77 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2025
Device: android
Incremental numbers instead of page numbers in the Index

Hi,
I have several Indesign exports in which the authors’ names in the Index refer to page numbers.
Like this
First Author 18, 25.
Second Author 123, 259, 368. etc.
Code:
<p class="inp">First Author <a href="epi1.xhtml#idx104">18</a>, <a href="epi2.xhtml#idx624">23</a></p>
<p class="inp">Second Author <a href="epi1.xhtml#idx057">123</a>, <a href="epi1.xhtml#idx178">259</a>, <a href="epi1.xhtml#idx241">368</a></p>
I need to replace these page numbers in the e-book with ascending numbers.
Like this:
First Author 1, 2.
Second Author 1, 2, 3. etc.

Code:
<p class="inp">First Author <a href="epi1.xhtml#idx104">1</a>, <a href="epi2.xhtml#idx624">2</a></p>
<p class="inp">Second Autor <a href="epi1.xhtml#idx057">1</a>, <a href="epi1.xhtml#idx178">2</a>, <a href="epi1.xhtml#idx241">3</a></p>
I wrote a simple but working Python script which does the job:
It looks like this:

Code:
import re

i = 0

def IncrementalNumbers(m):
    global i
    i+=1
    return str(i)

PageNumbers = r'(\d+)(?=</a>)'

with open("index.xhtml", 'r') as fp, open("index_renumbered.xhtml","w") as out:
    # read only one line of the file and apply the transformations
    for line in fp:
        i = 0
        l = re.sub(PageNumbers, IncrementalNumbers, line)
        out.write(l)
It would be much faster if I could use this as a plugin, but unfortunately, I have never written a plugin before, and I don't have enough knowledge for it.
My first attempt was only half successful, because the plugin counts globally, not line by line.
Code:
import re
import sys
import sigil_bs4
from bs4 import BeautifulSoup

text_type = str

i = 0

def IncrementalNumbers(m):
    global i
    i+=1
    return str(i)

PageNumbers = r'(\d+)(?=</a>)'
#RefSymbol = '←'

def run(bk):
    for (id, href) in bk.text_iter():
        print('Start %s:' % href)
        html = bk.readfile(id)
        soup = sigil_bs4.BeautifulSoup(html)

    html_orig = html

    html = re.sub(PageNumbers, IncrementalNumbers, html)


    if not html == html_orig:
        print("Modified File --> ", id)
        bk.writefile(id, html)

    return 0


def main():
    print("I reached main when I should not have\n")
    return -1

if __name__ == "__main__":
    sys.exit(main())
This is the result: Second Author 3, 4, 5 instead of Second Author 1, 2, 3
Code:
<p class="inp">First Author <a href="epi1.xhtml#idx104">1</a>, <a href="epi2.xhtml#idx624">2</a></p>
<p class="inp">Second Autor <a href="epi1.xhtml#idx057">1</a>, <a href="epi1.xhtml#idx178">2</a>, <a href="epi1.xhtml#idx241">3</a></p>
Can someone help me to define the lines in the plugin?
Thank you
NL77 is offline   Reply With Quote
Old 03-19-2025, 11:36 AM   #2
Haudek
Member
Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.Haudek knows the difference between a duck.
 
Posts: 24
Karma: 111614
Join Date: Mar 2025
Location: Poland
Device: Kindle Voyage
Try this.
Spoiler:

Code:
import re
from sigil_bs4 import BeautifulSoup

def run(bk):
    PageNumbers = r'(\d+)(?=</a>)'

    for (id, href) in bk.text_iter():
        print('Processing:', href)
        html = bk.readfile(id)
        soup = BeautifulSoup(html, 'html.parser')

        modified = False

        for p in soup.find_all('p', class_='inp'):
            i = 0  # HERE RESET

            def IncrementalNumbers(m):
                nonlocal i
                i += 1
                return str(i)

            new_html = re.sub(PageNumbers, IncrementalNumbers, str(p), count=0)

            if new_html != str(p):
                modified = True
                p.replace_with(BeautifulSoup(new_html, 'html.parser'))

        if modified:
            print("Modified File -->", id)
            bk.writefile(id, str(soup))

    return 0

def main():
    print("I reached main when I should not have\n")
    return -1

if __name__ == "__main__":
    sys.exit(main())


Look at line with "RESET HERE". Here we reset the counter and each paragraph is counted separately.
Haudek is offline   Reply With Quote
Advert
Old 03-19-2025, 01:24 PM   #3
NL77
Junior Member
NL77 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2025
Device: android
Thank you, it works beautifully, I am very grateful.

I don‘t know how common this problem is, but I‘m working on a number of textbooks with indexes, and you‘ve saved me precious minutes.
NL77 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to remove repeated incremental numbers in books pinky62 Library Management 3 12-12-2022 03:28 PM
Index: replace generic links with page numbers del.libro PDF 1 02-28-2021 08:20 AM
Clara HD Doubt with page numbers in kepub index Fenrag Kobo Reader 7 05-15-2020 04:34 AM
Kindle (AZW3/MOBI) ebooks with "real page numbers" to PDF with same page numbers? abvgd Conversion 2 05-24-2013 01:24 PM
Is there a hack for displaying page numbers rather than location numbers? nesler Kindle Developer's Corner 16 02-15-2011 12:00 AM


All times are GMT -4. The time now is 09:45 AM.


MobileRead.com is a privately owned, operated and funded community.