View Single Post
Old 11-15-2016, 02:57 AM   #1
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
Problems using sigil_bs4 and gumbo_bs4.parse

(Moved from the Sigil user forum, with apologies)

I'm currently trying to write an html conversion plugin for Sigil which runs on python 3.4 (external) or Sigil's bundled python 3.5+(internal). As part of the html sanitizing process I currently use bs4 with python 3.4 and this works fine.

But when I use sigil_bs4 or gumbo_bs4.parse from the bundled python I do not get the same results as using bs4 -- because it simply doesn't work. Here is the code:

When I use this code with bs4 on my python 3.4 it works fine:

Code:
from bs4 import BeautifulSoup as bs

    html = open(file, 'rt', encoding='utf-8').read()
    soup = bs(html, 'html.parser')
    
    for tag in soup():
        for attribute in ["lang", "id", "dir", "name" "link"]:
            del tag[attribute]
But when I write this code using sigil_bs4 or gumbo_bs4.parse with the bundled python(3.5+) swtiched on it doesn't do the job and also doesn't give any specific errors.

Code:
from sigil_bs4 import BeautifulSoup as bs
(or import sigil_gumbo_bs4_adapter as gumbo_bs4)

    html = open(file, 'rt', encoding='utf-8').read()
    soup = bs(html, 'html.parser')
    (or soup = gumbo_bs4.parse(html))

    for tag in soup():
        for attribute in ["lang", "id", "dir", "name" "link"]:
            del tag[attribute]
I'm using Sigil 0.9.7 on Windows 8.

It seems that neither sigil_bs4 nor gumbo_bs4.parse produce a callable BS object(taking no arguments and returning a list of all html tags) which is what I need for the above code to work. I've also used sigil_bs4 quite successfully throughout my plugin as a line by line parser for other formatting(but not as a callable object as above).

Any further suggestions to make this code work for sigil_bs4 or gumbo_bs4.parse would be greatly appreciated.

This is my first python plugin(or major python app of any note).

Last edited by slowsmile; 11-15-2016 at 03:11 AM.
slowsmile is offline   Reply With Quote