@Karellen : From your other thread, I see this :
Quote:
I am surprised that you think there is too little use for a tag report, as users can very quickly spot tags that need investigating. Personally, I have been caught out so many times- that tag which is used once, and makes you wonder why it is even there.
|
In that case, you may easily print only tags that have less than x entries, or have a list of exclusion for common tags as body, div, p, and so on.
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
"""
Count the number of occurrences for every html tag in an epub
May be filtered by tag name and by max number of occ.
search regex: <(/w+)
"""
# last passage
if match == None:
# Exclusions:
# excl = ('html', 'meta', 'body', 'title', 'div', 'p') # or () for no exclusion
# max_it = 5 # no print if more occ than this. None or 0 for no limit
excl = ()
max_it = None
my_tags = {k: v for k, v in data.items() if k not in excl and (not max_it or v <= max_it)}
print(f'Found a total of {number} tags, with {len(data)} different tags')
if len(my_tags) < len(data):
print(f'Selected a total of {sum(my_tags.values())} tags, with {len(my_tags)} different tags')
for key in sorted(my_tags):
print(f'{key}: {my_tags[key]}')
return
# normal passage
tag = match[1]
data[tag] = data.setdefault(tag, 0) +1
return match[0]
replace.call_after_last_match = True # Ask for last passage
Note : You may add in the same way a filter "min_it", although I don't see a use for it
Note : I've suppressed the error test
if not data since we must have at least an html tag in a valid epub