View Single Post
Old 01-24-2016, 05:14 PM   #23
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
You use the gumbo_bs4_adapter.py not soup directly. I will run a test to see how it handles it.

Here is an example of using it:
Code:
    # examples for using the bs4/gumbo parser to process xhtml
    import sigil_bs4
    import sigil_gumbo_bs4_adapter as gumbo_bs4

    samp = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml/" xml:lang="en" lang="en-US">
<head><title>testing & entities</title></head>
<body>
  <p class="first second">this is the <i><b>copyright</i></b> symbol "©"</p>
  <p xmlns:xlink="http://www.w3.org/xlink" class="second" xlink:href="http://www.ggogle.com">this used to test atribute namespaces</p>
</body>
</html>
"""

    soup = gumbo_bs4.parse(samp)
    for node in soup.find_all(attrs={'class':'second'}):
        print(node)
    print(soup.serialize_xhtml());”

Last edited by KevinH; 01-24-2016 at 05:28 PM.
KevinH is offline   Reply With Quote