View Single Post
Old 04-03-2025, 07:48 AM   #10
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 169
Karma: 1497966
Join Date: Jul 2021
Device: N/A
@Karellen : From your other thread, I see this :
Quote:
I am surprised that you think there is too little use for a tag report, as users can very quickly spot tags that need investigating. Personally, I have been caught out so many times- that tag which is used once, and makes you wonder why it is even there.
In that case, you may easily print only tags that have less than x entries, or have a list of exclusion for common tags as body, div, p, and so on.

Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    """
    Count the number of occurrences for every html tag in an epub
    May be filtered by tag name and by max number of occ.
    search regex: <(/w+)
    """
    # last passage
    if match == None:

        # Exclusions:
        # excl = ('html', 'meta', 'body', 'title', 'div', 'p')    # or () for no exclusion
        # max_it = 5    # no print if more occ than this. None or 0 for no limit
        excl = ()
        max_it = None

        my_tags = {k: v for k, v in data.items() if k not in excl and (not max_it or v <= max_it)}
        print(f'Found a total of {number} tags, with {len(data)} different tags')
        if len(my_tags) < len(data):
            print(f'Selected a total of {sum(my_tags.values())} tags, with {len(my_tags)} different tags')
        for key in sorted(my_tags):
            print(f'{key}: {my_tags[key]}')
        return

    # normal passage
    tag = match[1]
    data[tag] = data.setdefault(tag, 0) +1
    return match[0]

replace.call_after_last_match = True    # Ask for last passage
Note : You may add in the same way a filter "min_it", although I don't see a use for it
Note : I've suppressed the error test if not data since we must have at least an html tag in a valid epub

Last edited by lomkiri; 04-04-2025 at 07:14 AM. Reason: max_it can be 0 or None for no limit of occurrences
lomkiri is offline   Reply With Quote