MobileRead Forums - View Single Post - Is there a way to see all HTML tags used in an ebook?

lomkiri · 04-02-2025, 09:51 PM

What about a search/replace on the whole epub, using a regex-fonction ?

find : <(\w+)
replace : the function below
Do a "Replace all", so you 'll get all the tags of the epub.
The number of replacements in the dialog box is the total of all tags, but no change is done in the epub.

Code:

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    
    # last passage
    if match == None:
        if not data:
            print('No tag found')
        else:
            print(f'Found a total of {number} tags, with {len(data)} different tags\n')
            for key in sorted(data):
                print(f'{key}: {data[key]}')
        return
    
    # normal passage
    tag = match[1]
    data[tag] = data.setdefault(tag, 0) +1
    return match[0]

replace.call_after_last_match = True    # Ask for last passage

The result will be :

Code:

Debug output from __count tags

Found a total of 12605 tags, with 22 different tags

a: 6
body: 78
br: 14
div: 143
em: 45
figure: 2
h1: 7
h2: 64
[etc.]