View Single Post
Old 04-06-2020, 10:16 AM   #2
carmenchu
Groupie
carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.
 
Posts: 183
Karma: 266070
Join Date: Dec 2010
Device: Win7,Win10,Lubuntu,smartphone
Quote:
Originally Posted by Doitsu View Post
If you have basic programming skills, you could also write an ad-hoc Sigil plugin using the BeautifulSoup library, which is bundled with Sigil, to manipulate tags. (The Sigil API documentation is here.)...
Thanks: very useful for what I am trying to do as a plugin.
Only, I do need a little help with syntax to make this modified code work:
Spoiler:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
from sigil_bs4 import BeautifulSoup

def run(bk):

# get all html files
for (html_id, href) in bk.text_iter():
file_name = os.path.basename(href)
html = bk.readfile(html_id)

# convert html to soup
soup = BeautifulSoup(html, 'html.parser')
orig_html = str(soup)

# get all i tags
italics = soup.find_all('i') # how for 'i', 'b', 'small', 'br', 'h1/2/3...'
for i in italics:
if 'class' in i.attrs:
print(file_name, 'found') # finds
if 'calibre' in i['class']:
# remove class attribute
print(file_name, 'found attrib') # doesn't find "calibre3"
del i['class']
# # change <span> to <b>
# span.name = 'b'
# else:
# # delete <span> tags with other classes
# span.unwrap()
# else:
# # delete <span> tags w/o classes
# span.unwrap()

# update file with changes
if str(soup) != orig_html:
bk.writefile(html_id, str(soup))
print(file_name, 'updated')

print('Done')
return 0

1. how to pass to soup.find_all() a list of tags as argument
2. how to rework
Code:
if 'calibre' in tag['class']
so that it would match a substring, i.e., 'calibre15'.
3. Would the code work as well for selecting <meta... /> tag by 'name' and deleting it? How?
Maybe it's trivial, but I am green--python 2.+ for Gimp is the fartest I have gone. And couldn't make anything of your link
Thanks!

* Sorry for the delay: too many irons...
** Does this get 'out of topic'? (better in plug-ins)
carmenchu is offline   Reply With Quote