View Single Post
Old 04-08-2020, 06:38 AM   #1
carmenchu
Groupie
carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.
 
Posts: 192
Karma: 266070
Join Date: Dec 2010
Location: Spain
Device: Win10,Win11,Ubuntu,PockbookLux44
Removing calibre classes...

As begun in this thread (moved from another, misleading title), I have working in a plug-in to remove those 'calibre#' classes that usually implement just the CSS defaults for the tag, and also redundant/unnecessary <meta.../> tags.
My (working) code so far is:
Spoiler:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os, re

from sigil_bs4 import BeautifulSoup


def run(bk):

# get all html files
for (html_id, href) in bk.text_iter():

deletec=[]
deletem=[]

file_name = os.path.basename(href)
html = bk.readfile(html_id)

# convert html to soup
soup = BeautifulSoup(html, 'html.parser')
orig_html = str(soup)

# get all i,b,small,sup,sub,br,a,li tags with class='*calibre*' (containing all those characters, in fact)
tags = soup.find_all(['i', 'b', 'small', 'sub', 'sup', 'br', 'a', 'li'], class_=re.compile("calibre"))
for tag in tags:
theclass = tag['class'] # list under 'html.parser': can be multivalued
if len(theclass) == 1: # not a multivalued class
# remove class attribute
deletec = deletec + [(tag.name, theclass)]
del tag['class']
# else: # remove the *calibre* style from class? How?

# this clears the plug-in console: add after
metas = soup.find_all('meta', attrs={'name': True})
for meta in metas:
# exclude 'calibre:cover', remove others
if not 'cover' in meta['name']: # works here: string, and NOT above: list
deletem = deletem + [(meta.name, meta['name'])]
meta.decompose() # all previous print statements lost here

# update file with changes
if str(soup) != orig_html:
bk.writefile(html_id, str(soup))
# write a list of changes for checking
print(deletec, sep=' ')
print(deletem, sep=' ')
print(file_name, 'updated')

print('Done')
return 0

def main():
print("I reached main when I should not have")
return -1

if __name__ == "__main__":
sys.exit(main())

- When run, it outputs in the plug-in window the full list of removals, so that one can check that nothing unintended is affected, and gets rid of calibre classes from i,b,small,sup,sub,br,a,li--plus all meta tags. Composite classes and <meta *name="*cover*"*/> excluded.
Now, I would appreciate further help from experts in improving the plug-in, allowing as options:
* edit the tag list
* include/exclude the metas removal
* include/exclude showing the (huge!) list of removals (i.e., a 'test mode')

As the Sigil_Plugin_Framework_rev12.epub lacks information on GUIs, please:
- For my needs, would Tkinter or PyQt5 be the simpler approach?
- Can some-helpful-body provide a very simple template (no bell or whistles) working in Sigil?

* Improvement of code welcome (I am learning)
** If somebody is interested, I can provide the plug-in in its present working state... for use only if those classes are in your way: maybe some 'user agents' require them?

Last edited by carmenchu; 04-08-2020 at 08:46 AM. Reason: improve code
carmenchu is offline   Reply With Quote