View Single Post
Old 11-06-2025, 03:01 PM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,089
Karma: 6379704
Join Date: Nov 2009
Device: many
No but since the gumbo parser is available for plugin use, you should be able to easily create an equivalent that mends xhtml files. Basically you feed the gumbo parser your code and ask for xhtml (or prettyxhml) out and it will generate it. Lookout for how you decide to handle xml headers and any numeric entities since gumbo will strip both out.

But why not just use an automation list and have it run mend for all files and then have it launch your plugin? Or visa-versa. That is what automation was designed for.

Here is the snippet that should mend and prettify an xhtml file contents passed in as a string called "samp" in this example

Code:
    import sigil_gumbo_bs4_adapter as gumbo_bs4

    samp = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en-US">
<head><title>testing & entities</title></head>
<body>
  <p class="first second">this&nbsp;is*the*<i><b>copyright</i></b> symbol "&copy;"</p>
  <p xmlns:xlink="http://www.w3.org/xlink" class="second" xlink:href="http://www.ggogle.com">this used to test atribute namespaces</p>
</body>
</html>
"""

    soup = gumbo_bs4.parse(samp)
    newsamp =  soup.prettyprint_xhtml()

If you just want to mend but not prettify that last line should be:

newsamp = soup.serialize_xhtml()

You will still have to replace any chars you wanted as numeric entities as all numeric entities are replaced by their unicode equivalents when parsed.

Last edited by KevinH; 11-06-2025 at 03:27 PM.
KevinH is offline   Reply With Quote