As begun in
this thread (moved from another, misleading title), I have working in a plug-in to remove those 'calibre#' classes that usually implement just the CSS defaults for the tag, and also redundant/unnecessary <meta.../> tags.
My (working) code so far is:
Spoiler:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os, re
from sigil_bs4 import BeautifulSoup
def run(bk):
# get all html files
for (html_id, href) in bk.text_iter():
deletec=[]
deletem=[]
file_name = os.path.basename(href)
html = bk.readfile(html_id)
# convert html to soup
soup = BeautifulSoup(html, 'html.parser')
orig_html = str(soup)
# get all i,b,small,sup,sub,br,a,li tags with class='*calibre*' (containing all those characters, in fact)
tags = soup.find_all(['i', 'b', 'small', 'sub', 'sup', 'br', 'a', 'li'], class_=re.compile("calibre"))
for tag in tags:
theclass = tag['class'] # list under 'html.parser': can be multivalued
if len(theclass) == 1: # not a multivalued class
# remove class attribute
deletec = deletec + [(tag.name, theclass)]
del tag['class']
# else: # remove the *calibre* style from class? How?
# this clears the plug-in console: add after
metas = soup.find_all('meta', attrs={'name': True})
for meta in metas:
# exclude 'calibre:cover', remove others
if not 'cover' in meta['name']: # works here: string, and NOT above: list
deletem = deletem + [(meta.name, meta['name'])]
meta.decompose() # all previous print statements lost here
# update file with changes
if str(soup) != orig_html:
bk.writefile(html_id, str(soup))
# write a list of changes for checking
print(deletec, sep=' ')
print(deletem, sep=' ')
print(file_name, 'updated')
print('Done')
return 0
def main():
print("I reached main when I should not have")
return -1
if __name__ == "__main__":
sys.exit(main())
- When run, it outputs in the plug-in window the full list of removals, so that one can check that nothing unintended is affected, and gets rid of calibre classes from i,b,small,sup,sub,br,a,li--plus all meta tags. Composite classes and <meta *name="*cover*"*/> excluded.
Now, I would appreciate further help from experts in improving the plug-in, allowing as options:
* edit the tag list
* include/exclude the metas removal
* include/exclude showing the (huge!) list of removals (i.e., a 'test mode')
As the Sigil_Plugin_Framework_rev12.epub lacks information on GUIs, please:
- For my needs, would Tkinter or PyQt5 be the simpler approach?
- Can some-helpful-body provide a very simple template (no bell or whistles) working in Sigil?
* Improvement of code welcome (I am learning)
** If somebody is interested, I can provide the plug-in in its present working state... for use only if those classes are in your way: maybe some 'user agents' require them?