Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 03-08-2026, 04:20 PM   #1
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,421
Karma: 6733960
Join Date: Nov 2009
Device: many
Claude AI to generate Sigil Plugins?

Hi All,

I wanted to pick people's brain about using the Claude AI to create a working Sigil plugin. Obviously I could do that myself easily but I wanted to see how hard/easy it would be for someone without any python coding experience to actually create a simple plugin using Claude.

So I had some questions:

1. Do I have to pay for using Claude to make a simple plugin for Sigil? Do they have any free subscriptions for OpenSource developers?

2. Have you tried something similar with Claude or OpenAI Coder?

3. If so, how good or bad was the attempt and how much manual work was needed to actually get things to work.

As an example project, I want to try using Claude to code up a Sigil edit plugin to allow a user select one or more html files and have them be read in (with all supporting resources via their links) and pre-process each html file by removing all <script> tags that do not have a src attribute.

This is really trivial to do in Sigil with PySide6 and regex, but I wanted to see for myself how this is done and what the state of the art can accomplish right now and as a learning experience for me to see how quickly I will become obsolete!

So happy to hear any thoughts, suggestions, and hints on how to go about doing this myself so I can see how things turn out. And I have enough plugins of my own that I can use them to "train" Clause if needed, and I will NOT give it anyone else's code, only my own to learn from if needed.

So if anyone has any tips on how to go about this on Linux, I would really like to hear your thoughts?

Thanks,

KevinH
KevinH is online now   Reply With Quote
Old 03-08-2026, 05:05 PM   #2
Quoth
Still reading
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 15,349
Karma: 113121827
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
If it works, it's using code copied from elsewhere.
I know someone ordered by management to use AI and everything is far slower as it all has to be tested and re-written. It's a scam.
Quoth is offline   Reply With Quote
Old 03-08-2026, 05:26 PM   #3
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,421
Karma: 6733960
Join Date: Nov 2009
Device: many
There is no exact code that does what I hoped to do in a plugin right now. Perhaps you missed the part about doing this as a learning experience and to see the state of the art myself to judge. If it fails miserably, then I will not be obsolete quickly. If it succeeds with just a little hand holding and guidance then I can point people who request Sigil features that are not a good fit for Sigil to try their hand at a plugin.

As a learning experience, it is a win win for me no matter how it turns out.

And as for licensing, I give my plugin code away, so if it trains on it, nothing is lost.

Last edited by KevinH; 03-08-2026 at 05:30 PM.
KevinH is online now   Reply With Quote
Old 03-08-2026, 05:43 PM   #4
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,800
Karma: 24088595
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
1. Do I have to pay for using Claude to make a simple plugin for Sigil? Do they have any free subscriptions for OpenSource developers?
Free subscriptions are available. But you'll need to sign up for a free account.

Quote:
Originally Posted by KevinH View Post
2. Have you tried something similar with Claude or OpenAI Coder?
I tested both Claude and Perplexity. They can both generate simple plugins with minimal corrections. BeckyEbook also experimented with this.

Quote:
Originally Posted by KevinH View Post
3. If so, how good or bad was the attempt and how much manual work was needed to actually get things to work.
It can be helpful for parts of a plugin. (I only used it for a particular function that I had problems with.)

Quote:
Originally Posted by KevinH View Post
As an example project, I want to try using Claude to code up a Sigil edit plugin to allow a user select one or more html files and have them be read in (with all supporting resources via their links) and pre-process each html file by removing all <script> tags that do not have a src attribute.
This kind of a simple plugin should be doable with Claude, Perplexity and possibly other AIs

I tested this with Claude. The first version was this:

Spoiler:
Code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
HTMLCleaner - Sigil Plugin
Strips elements that cause Sigil's XML parser to choke:
  - <script> blocks
  - <noscript> blocks
  - <link> tags referencing external stylesheets
  - Bare & in attribute values (escaped to &amp;)
  - External http/https resource references in src/href attributes
  - HTML comments containing conditional IE syntax
"""

import sys
import os
import re


def clean_html(raw):
    """
    Apply a series of regex-based cleaning passes to raw HTML text.
    Returns (cleaned_text, list_of_changes).
    """
    changes = []
    text = raw

    # 1. Remove all <script>...</script> blocks (including multi-line)
    before = len(re.findall(r'<script[\s\S]*?</script>', text, re.IGNORECASE))
    text = re.sub(r'<script[\s\S]*?</script>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} <script> block(s)")

    # 2. Remove <noscript>...</noscript> blocks
    before = len(re.findall(r'<noscript[\s\S]*?</noscript>', text, re.IGNORECASE))
    text = re.sub(r'<noscript[\s\S]*?</noscript>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} <noscript> block(s)")

    # 3. Remove <link> tags that reference external (http/https) resources
    before = len(re.findall(r'<link[^>]+href=["\']https?://[^>]+>', text, re.IGNORECASE))
    text = re.sub(r'<link[^>]+href=["\']https?://[^>]+>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} external <link> tag(s)")

    # 4. Remove <link> tags for stylesheets/icons (css.php etc) by rel type
    before = len(re.findall(r'<link[^>]+rel=["\']stylesheet["\'][^>]*>', text, re.IGNORECASE))
    text = re.sub(r'<link[^>]+rel=["\']stylesheet["\'][^>]*>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} stylesheet <link> tag(s)")

    # 5. Remove <meta> tags with http-equiv (can cause issues)
    before = len(re.findall(r'<meta[^>]+http-equiv[^>]*>', text, re.IGNORECASE))
    text = re.sub(r'<meta[^>]+http-equiv[^>]*>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} http-equiv <meta> tag(s)")

    # 6. Fix bare & in attribute values
    #    Strategy: inside tag attributes (between < and >), replace & not followed by
    #    word chars + ; with &amp;
    #    We do this carefully to avoid double-escaping already-correct &amp; etc.
    def fix_ampersands_in_tag(m):
        tag_text = m.group(0)
        # Replace & that is NOT already part of a named/numeric entity
        fixed = re.sub(r'&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#[0-9]+|#x[0-9a-fA-F]+);)', '&amp;', tag_text)
        return fixed

    old_text = text
    text = re.sub(r'<[^>]+>', fix_ampersands_in_tag, text)
    amp_count = old_text.count('&') - text.count('&')
    if amp_count > 0:
        changes.append(f"Escaped {amp_count} bare & character(s) in tags to &amp;")

    # 7. Remove IE conditional comments <!-- [if ...]> ... <![endif]--> 
    before = len(re.findall(r'<!--\[if[\s\S]*?<!\[endif\]-->', text, re.IGNORECASE))
    text = re.sub(r'<!--\[if[\s\S]*?<!\[endif\]-->', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} IE conditional comment(s)")

    # 8. Replace external src attributes with empty string (images/iframes pointing off-site)
    def blank_external_src(m):
        return re.sub(r'(src=["\'])https?://[^"\']+(["\'])', r'\1\2', m.group(0))

    old_text = text
    text = re.sub(r'<(?:img|iframe|audio|video|source)[^>]+>', blank_external_src, text, flags=re.IGNORECASE)
    if old_text != text:
        changes.append("Blanked external src= URLs in media tags")

    return text, changes


def run(bc):
    """
    Entry point called by Sigil.
    bc = BookContainer object providing access to all book files.
    """
    manifest = []

    for mid, href in bc.text_iter():
        raw = bc.readfile(mid)
        # Sigil may hand us bytes or str depending on version
        if isinstance(raw, bytes):
            raw = raw.decode('utf-8', errors='replace')

        cleaned, changes = clean_html(raw)

        if changes:
            bc.writefile(mid, cleaned)
            manifest.append((href, changes))

    # Print a summary to Sigil's plugin output window
    if manifest:
        print("HTMLCleaner: changes made\n" + "="*40)
        for href, changes in manifest:
            print(f"\n{href}:")
            for c in changes:
                print(f"  • {c}")
    else:
        print("HTMLCleaner: nothing to clean — all files look fine.")

    return 0


def main():
    # When invoked by Sigil, sys.argv[1] is the path to the book container support file.
    if len(sys.argv) < 2:
        print("This script must be run as a Sigil plugin.", file=sys.stderr)
        return 1

    # Sigil injects the BookContainer machinery via its launcher; just call run().
    # The launcher imports this module and calls run(bc) directly —
    # but we keep main() here for completeness.
    from bookcontainer import BookContainer  # provided by Sigil's plugin runtime
    bc = BookContainer(sys.argv[1])
    return run(bc)


if __name__ == "__main__":
    sys.exit(main())


It hallucinated quite a bit. When I pointed out that Sigil uses bk not bc it slightly changed the code and came up with:

Spoiler:
Code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
HTMLCleaner - Sigil Plugin
Strips elements that cause Sigil's XML parser to choke:
  - <script> blocks
  - <noscript> blocks
  - <link> tags referencing external stylesheets
  - Bare & in attribute values (escaped to &amp;)
  - External http/https resource references in src/href attributes
  - IE conditional comments
"""

import re


def clean_html(raw):
    """
    Apply regex-based cleaning passes to raw HTML text.
    Returns (cleaned_text, list_of_changes).
    """
    changes = []
    text = raw

    # 1. Remove all <script>...</script> blocks
    before = len(re.findall(r'<script[\s\S]*?</script>', text, re.IGNORECASE))
    text = re.sub(r'<script[\s\S]*?</script>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} <script> block(s)")

    # 2. Remove <noscript>...</noscript> blocks
    before = len(re.findall(r'<noscript[\s\S]*?</noscript>', text, re.IGNORECASE))
    text = re.sub(r'<noscript[\s\S]*?</noscript>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} <noscript> block(s)")

    # 3. Remove <link> tags pointing to external (http/https) resources
    before = len(re.findall(r'<link[^>]+href=["\']https?://[^>]+>', text, re.IGNORECASE))
    text = re.sub(r'<link[^>]+href=["\']https?://[^>]+>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} external <link> tag(s)")

    # 4. Remove <link rel="stylesheet"> tags
    before = len(re.findall(r'<link[^>]+rel=["\']stylesheet["\'][^>]*>', text, re.IGNORECASE))
    text = re.sub(r'<link[^>]+rel=["\']stylesheet["\'][^>]*>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} stylesheet <link> tag(s)")

    # 5. Remove <meta http-equiv> tags
    before = len(re.findall(r'<meta[^>]+http-equiv[^>]*>', text, re.IGNORECASE))
    text = re.sub(r'<meta[^>]+http-equiv[^>]*>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} http-equiv <meta> tag(s)")

    # 6. Fix bare & in tag attributes (avoid double-escaping existing entities)
    def fix_ampersands_in_tag(m):
        tag_text = m.group(0)
        fixed = re.sub(r'&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#[0-9]+|#x[0-9a-fA-F]+);)', '&amp;', tag_text)
        return fixed

    old_text = text
    text = re.sub(r'<[^>]+>', fix_ampersands_in_tag, text)
    amp_count = old_text.count('&') - text.count('&')
    if amp_count > 0:
        changes.append(f"Escaped {amp_count} bare & character(s) in tags to &amp;")

    # 7. Remove IE conditional comments
    before = len(re.findall(r'<!--\[if[\s\S]*?<!\[endif\]-->', text, re.IGNORECASE))
    text = re.sub(r'<!--\[if[\s\S]*?<!\[endif\]-->', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} IE conditional comment(s)")

    # 8. Blank external src= URLs on media tags
    def blank_external_src(m):
        return re.sub(r'(src=["\'])https?://[^"\']+(["\'])', r'\1\2', m.group(0))

    old_text = text
    text = re.sub(r'<(?:img|iframe|audio|video|source)[^>]+>', blank_external_src, text, flags=re.IGNORECASE)
    if old_text != text:
        changes.append("Blanked external src= URLs in media tags")

    return text, changes


def run(bk):
    """
    Sigil calls run(bk) -- bk is the book wrapper object provided by Sigil.
    """
    manifest = []

    for mid, href in bk.text_iter():
        raw = bk.readfile(mid)
        if isinstance(raw, bytes):
            raw = raw.decode('utf-8', errors='replace')

        cleaned, changes = clean_html(raw)

        if changes:
            bk.writefile(mid, cleaned)
            manifest.append((href, changes))

    if manifest:
        print("HTMLCleaner: changes made\n" + "=" * 40)
        for href, changes in manifest:
            print(f"\n{href}:")
            for c in changes:
                print(f"  - {c}")
    else:
        print("HTMLCleaner: nothing to clean - all files look fine.")

    return 0


It also generated this incorrect plugin.xml file:

Spoiler:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<plugin>
  <name>HTMLCleaner</name>
  <identifier>net.sigil.plugins.htmlcleaner</identifier>
  <version>0.1.0</version>
  <description>Strips elements from HTML files that Sigil's XML parser dislikes: script tags, external stylesheets, bare ampersands in attributes, and external resource references.</description>
  <type>edit</type>
  <engine>python3</engine>
  <author>custom</author>
  <os>*</os>
</plugin>


After fixing plugin.xml the plugin kind of works. For example it'll remove:
Code:
<link rel="manifest" href="https://www.alternatehistory.com/forum/webmanifest.php">
But it'll also remove content that users want to keep. E.g.
Code:
<link href="../Styles/sgc-nav.css" rel="stylesheet" type="text/css"/>
So it is possible to generate a basic plugin with an AI tool.

Claude also has a skills concept. I haven't tested it, though. Maybe you could define a Sigil plugin skill?

Last edited by Doitsu; 03-08-2026 at 06:14 PM.
Doitsu is offline   Reply With Quote
Old 03-08-2026, 06:29 PM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 29,164
Karma: 211504714
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Quoth View Post
If it works, it's using code copied from elsewhere.
I know someone ordered by management to use AI and everything is far slower as it all has to be tested and re-written. It's a scam.
Please take your bias elsewhere. No one asked for your opinions on AI generated code in general. Answer the questions asked or refrain from commenting please.
DiapDealer is offline   Reply With Quote
Old 03-08-2026, 07:02 PM   #6
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,421
Karma: 6733960
Join Date: Nov 2009
Device: many
That is interesting. I had never heard of Perplexity before. Did you ask Claude to read the Sigil Plugin Framework document and use its api, or just give it a few example Sigil plugins? If not, how else would it know the structure a Sigil plugin requires? As the author of the Sigil Plugin Framework, I would be okay with that, especially if it helps novices write their own Sigil plugins.

And, I just noticed that Anthropic is giving away 10,000 free Claude Max accounts for 6 months to active OpenSource developers with public projects on github with more than 5000 stars. Sigil has 6,700 stars so I think DiapDealer or I (or both of us) may qualify. If not, I will look at other free account options.

Thanks for your tips!

KevinH



Quote:
Originally Posted by Doitsu View Post
Free subscriptions are available. But you'll need to sign up for a free account.


I tested both Claude and Perplexity. They can both generate simple plugins with minimal corrections. BeckyEbook also experimented with this.


It can be helpful for parts of a plugin. (I only used it for a particular function that I had problems with.)


This kind of a simple plugin should be doable with Claude, Perplexity and possibly other AIs

I tested this with Claude. The first version was this:

Spoiler:
Code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
HTMLCleaner - Sigil Plugin
Strips elements that cause Sigil's XML parser to choke:
  - <script> blocks
  - <noscript> blocks
  - <link> tags referencing external stylesheets
  - Bare & in attribute values (escaped to &amp;)
  - External http/https resource references in src/href attributes
  - HTML comments containing conditional IE syntax
"""

import sys
import os
import re


def clean_html(raw):
    """
    Apply a series of regex-based cleaning passes to raw HTML text.
    Returns (cleaned_text, list_of_changes).
    """
    changes = []
    text = raw

    # 1. Remove all <script>...</script> blocks (including multi-line)
    before = len(re.findall(r'<script[\s\S]*?</script>', text, re.IGNORECASE))
    text = re.sub(r'<script[\s\S]*?</script>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} <script> block(s)")

    # 2. Remove <noscript>...</noscript> blocks
    before = len(re.findall(r'<noscript[\s\S]*?</noscript>', text, re.IGNORECASE))
    text = re.sub(r'<noscript[\s\S]*?</noscript>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} <noscript> block(s)")

    # 3. Remove <link> tags that reference external (http/https) resources
    before = len(re.findall(r'<link[^>]+href=["\']https?://[^>]+>', text, re.IGNORECASE))
    text = re.sub(r'<link[^>]+href=["\']https?://[^>]+>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} external <link> tag(s)")

    # 4. Remove <link> tags for stylesheets/icons (css.php etc) by rel type
    before = len(re.findall(r'<link[^>]+rel=["\']stylesheet["\'][^>]*>', text, re.IGNORECASE))
    text = re.sub(r'<link[^>]+rel=["\']stylesheet["\'][^>]*>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} stylesheet <link> tag(s)")

    # 5. Remove <meta> tags with http-equiv (can cause issues)
    before = len(re.findall(r'<meta[^>]+http-equiv[^>]*>', text, re.IGNORECASE))
    text = re.sub(r'<meta[^>]+http-equiv[^>]*>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} http-equiv <meta> tag(s)")

    # 6. Fix bare & in attribute values
    #    Strategy: inside tag attributes (between < and >), replace & not followed by
    #    word chars + ; with &amp;
    #    We do this carefully to avoid double-escaping already-correct &amp; etc.
    def fix_ampersands_in_tag(m):
        tag_text = m.group(0)
        # Replace & that is NOT already part of a named/numeric entity
        fixed = re.sub(r'&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#[0-9]+|#x[0-9a-fA-F]+);)', '&amp;', tag_text)
        return fixed

    old_text = text
    text = re.sub(r'<[^>]+>', fix_ampersands_in_tag, text)
    amp_count = old_text.count('&') - text.count('&')
    if amp_count > 0:
        changes.append(f"Escaped {amp_count} bare & character(s) in tags to &amp;")

    # 7. Remove IE conditional comments <!-- [if ...]> ... <![endif]--> 
    before = len(re.findall(r'<!--\[if[\s\S]*?<!\[endif\]-->', text, re.IGNORECASE))
    text = re.sub(r'<!--\[if[\s\S]*?<!\[endif\]-->', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} IE conditional comment(s)")

    # 8. Replace external src attributes with empty string (images/iframes pointing off-site)
    def blank_external_src(m):
        return re.sub(r'(src=["\'])https?://[^"\']+(["\'])', r'\1\2', m.group(0))

    old_text = text
    text = re.sub(r'<(?:img|iframe|audio|video|source)[^>]+>', blank_external_src, text, flags=re.IGNORECASE)
    if old_text != text:
        changes.append("Blanked external src= URLs in media tags")

    return text, changes


def run(bc):
    """
    Entry point called by Sigil.
    bc = BookContainer object providing access to all book files.
    """
    manifest = []

    for mid, href in bc.text_iter():
        raw = bc.readfile(mid)
        # Sigil may hand us bytes or str depending on version
        if isinstance(raw, bytes):
            raw = raw.decode('utf-8', errors='replace')

        cleaned, changes = clean_html(raw)

        if changes:
            bc.writefile(mid, cleaned)
            manifest.append((href, changes))

    # Print a summary to Sigil's plugin output window
    if manifest:
        print("HTMLCleaner: changes made\n" + "="*40)
        for href, changes in manifest:
            print(f"\n{href}:")
            for c in changes:
                print(f"  • {c}")
    else:
        print("HTMLCleaner: nothing to clean — all files look fine.")

    return 0


def main():
    # When invoked by Sigil, sys.argv[1] is the path to the book container support file.
    if len(sys.argv) < 2:
        print("This script must be run as a Sigil plugin.", file=sys.stderr)
        return 1

    # Sigil injects the BookContainer machinery via its launcher; just call run().
    # The launcher imports this module and calls run(bc) directly —
    # but we keep main() here for completeness.
    from bookcontainer import BookContainer  # provided by Sigil's plugin runtime
    bc = BookContainer(sys.argv[1])
    return run(bc)


if __name__ == "__main__":
    sys.exit(main())


It hallucinated quite a bit. When I pointed out that Sigil uses bk not bc it slightly changed the code and came up with:

Spoiler:
Code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
HTMLCleaner - Sigil Plugin
Strips elements that cause Sigil's XML parser to choke:
  - <script> blocks
  - <noscript> blocks
  - <link> tags referencing external stylesheets
  - Bare & in attribute values (escaped to &amp;)
  - External http/https resource references in src/href attributes
  - IE conditional comments
"""

import re


def clean_html(raw):
    """
    Apply regex-based cleaning passes to raw HTML text.
    Returns (cleaned_text, list_of_changes).
    """
    changes = []
    text = raw

    # 1. Remove all <script>...</script> blocks
    before = len(re.findall(r'<script[\s\S]*?</script>', text, re.IGNORECASE))
    text = re.sub(r'<script[\s\S]*?</script>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} <script> block(s)")

    # 2. Remove <noscript>...</noscript> blocks
    before = len(re.findall(r'<noscript[\s\S]*?</noscript>', text, re.IGNORECASE))
    text = re.sub(r'<noscript[\s\S]*?</noscript>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} <noscript> block(s)")

    # 3. Remove <link> tags pointing to external (http/https) resources
    before = len(re.findall(r'<link[^>]+href=["\']https?://[^>]+>', text, re.IGNORECASE))
    text = re.sub(r'<link[^>]+href=["\']https?://[^>]+>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} external <link> tag(s)")

    # 4. Remove <link rel="stylesheet"> tags
    before = len(re.findall(r'<link[^>]+rel=["\']stylesheet["\'][^>]*>', text, re.IGNORECASE))
    text = re.sub(r'<link[^>]+rel=["\']stylesheet["\'][^>]*>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} stylesheet <link> tag(s)")

    # 5. Remove <meta http-equiv> tags
    before = len(re.findall(r'<meta[^>]+http-equiv[^>]*>', text, re.IGNORECASE))
    text = re.sub(r'<meta[^>]+http-equiv[^>]*>', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} http-equiv <meta> tag(s)")

    # 6. Fix bare & in tag attributes (avoid double-escaping existing entities)
    def fix_ampersands_in_tag(m):
        tag_text = m.group(0)
        fixed = re.sub(r'&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#[0-9]+|#x[0-9a-fA-F]+);)', '&amp;', tag_text)
        return fixed

    old_text = text
    text = re.sub(r'<[^>]+>', fix_ampersands_in_tag, text)
    amp_count = old_text.count('&') - text.count('&')
    if amp_count > 0:
        changes.append(f"Escaped {amp_count} bare & character(s) in tags to &amp;")

    # 7. Remove IE conditional comments
    before = len(re.findall(r'<!--\[if[\s\S]*?<!\[endif\]-->', text, re.IGNORECASE))
    text = re.sub(r'<!--\[if[\s\S]*?<!\[endif\]-->', '', text, flags=re.IGNORECASE)
    if before:
        changes.append(f"Removed {before} IE conditional comment(s)")

    # 8. Blank external src= URLs on media tags
    def blank_external_src(m):
        return re.sub(r'(src=["\'])https?://[^"\']+(["\'])', r'\1\2', m.group(0))

    old_text = text
    text = re.sub(r'<(?:img|iframe|audio|video|source)[^>]+>', blank_external_src, text, flags=re.IGNORECASE)
    if old_text != text:
        changes.append("Blanked external src= URLs in media tags")

    return text, changes


def run(bk):
    """
    Sigil calls run(bk) -- bk is the book wrapper object provided by Sigil.
    """
    manifest = []

    for mid, href in bk.text_iter():
        raw = bk.readfile(mid)
        if isinstance(raw, bytes):
            raw = raw.decode('utf-8', errors='replace')

        cleaned, changes = clean_html(raw)

        if changes:
            bk.writefile(mid, cleaned)
            manifest.append((href, changes))

    if manifest:
        print("HTMLCleaner: changes made\n" + "=" * 40)
        for href, changes in manifest:
            print(f"\n{href}:")
            for c in changes:
                print(f"  - {c}")
    else:
        print("HTMLCleaner: nothing to clean - all files look fine.")

    return 0


It also generated this incorrect plugin.xml file:

Spoiler:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<plugin>
  <name>HTMLCleaner</name>
  <identifier>net.sigil.plugins.htmlcleaner</identifier>
  <version>0.1.0</version>
  <description>Strips elements from HTML files that Sigil's XML parser dislikes: script tags, external stylesheets, bare ampersands in attributes, and external resource references.</description>
  <type>edit</type>
  <engine>python3</engine>
  <author>custom</author>
  <os>*</os>
</plugin>


After fixing plugin.xml the plugin kind of works. For example it'll remove:
Code:
<link rel="manifest" href="https://www.alternatehistory.com/forum/webmanifest.php">
But it'll also remove content that users want to keep. E.g.
Code:
<link href="../Styles/sgc-nav.css" rel="stylesheet" type="text/css"/>
So it is possible to generate a basic plugin with an AI tool.

Claude also has a skills concept. I haven't tested it, though. Maybe you could define a Sigil plugin skill?
KevinH is online now   Reply With Quote
Old 03-08-2026, 07:10 PM   #7
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,800
Karma: 24088595
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
That is interesting. I had never heard of Perplexity before. Did you ask Claude to read the Sigil Plugin Framework document and use its api, or just give it a few example Sigil plugins?
No, I just gave it an incorrectly formatted HTML file and told it to write a Sigil plugin that removes all problematic sections using python.
Doitsu is offline   Reply With Quote
Old 03-08-2026, 07:17 PM   #8
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,421
Karma: 6733960
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by Doitsu View Post
No, I just gave it an incorrectly formatted HTML file and told it to write a Sigil plugin that removes all problematic sections using python.
Wow, that's it?

I will have to play around with the free version of Claude just for fun if nothing else.
KevinH is online now   Reply With Quote
Old Yesterday, 05:24 AM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 46,066
Karma: 29579868
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@KevinH: You will likely already have free access to claude (and other models) via GitHub copilot. And yes this kind of thing will be fairly trivial for copilot to get almost entirely correct in agentic mode. And you dont need to install anything on your computer to use it. Simply go to https://github.com/copilot and use the chat interface to ask copilot to send a PR or generate code whatever suits the use case. A few sentences (3-4) of instructions will likely generate largely working code. I'll leave you a couple of examples of doing this via PRs in calibre. Note that though the PRs are marked closed they were actually merged (I squash the commits and merge as a single commit).

https://github.com/kovidgoyal/calibre/pull/3023
https://github.com/kovidgoyal/calibre/pull/3024
https://github.com/kovidgoyal/calibre/pull/3026
kovidgoyal is offline   Reply With Quote
Old Today, 01:14 PM   #10
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,421
Karma: 6733960
Join Date: Nov 2009
Device: many
Wow I am astonished! I created an empty repo in my personal github account just as a playground: (notice I could not spell generator correctly but !)

https://github.com/kevinhendricks/plugin_generatior

And used the initial following prompt:

Quote:
Create a python plugin for Sigil that uses the api and structure documented in the Sigil_Plugin_Framework_rev15.epub. This plugin should prompt the user using PySide6 to select any number of html files using a qfiledialog and then for each file selected, read in the file, use regular expressions to remove all script tags who do not have a src attribute and then add the edited file to the current epub using the proper api.
The agent took about 16 minutes to generate something I could review. It all looked correct accept for the structure of the plugin.xml file. It contained additional tags that are not allowed according to the spec.

https://github.com/kevinhendricks/pl...1-4e23cf279e7a

So after accepting their initial attempt, I added the Sigil_Plugin_Framework_rev15.epub file to the repo and then opened a second agent request:

Quote:
Fix plugin.xml to use only the allowed tags as described in the #Sigil_Plugin_Framework_rev15.epub document.
It came back also instantly with a fixed plugin.xml file. From looking at the agent thread it clearly unpacked the framework epub, read it in, identified the part about the plugin.xml tags in the api and identified the incorrect tags all itself. The ability to follow an api even one documented in an open structured epub simply amazes me.

https://github.com/kevinhendricks/pl...8-4019c89bea1b

I then built and tested the resulting plugin (zipping it by hand) and it actually worked correctly.

I never actually touched one line of code and instead just gave commands to fix errors in the plugin.xml file based on the api docs alone.

I am truly astonished.

Copilot does store your the exact text of your request but it was not easy to find (you have to look in a PR's edit history) so I quickly learned to copy each one of them to a separate text editor for safe keeping).

What is more fun is looking at the Agents tab in my repo and seeing the reasoning and logic for what it did in response to each of my two queries.

It is truly amazing.

I can easily see how continuously refining requests from an experienced programmer could easily speed up creation of coding project. I am still not sure that a non-programmer could actually succeed with this tool as it stands now.

My plan it to use the Agent to refine the existing plugin to use chardet to handle the cases where the base html encoding is not utf-8 but Windows-1252 or some other iso 8 bit encoding. Then have it run each page though our sigil_gumbo library to handle any other issues in its structure before adding it to the existing epub.

This technology is going to make experienced programmers much more effective as the Agent will listen to the programmer's preferred tools and approaches and can absorb and use api documentation.

Thank you to everyone for their guidance and especially Kovid in his examples and where exactly to go to get started and Doitsu for his specific examples.

Last edited by KevinH; Today at 04:27 PM. Reason: update with info from Comfy.n
KevinH is online now   Reply With Quote
Old Today, 03:52 PM   #11
Comfy.n
want to learn what I want
Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.
 
Posts: 2,007
Karma: 7955879
Join Date: Sep 2020
Device: none
Quote:
Originally Posted by KevinH View Post
I am truly astonished.

For some reason Copilot does not store your the exact text of your request (so I quickly learned to copy each one of them to a separate text editor for safe keeping).
Yes, it's amazing.

BTW, I can see the original prompt under PR edits:

Click image for larger version

Name:	MWSnap 2026-03-10, 16_47_10.png
Views:	9
Size:	66.1 KB
ID:	221646
Comfy.n is online now   Reply With Quote
Old Today, 04:18 PM   #12
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,421
Karma: 6733960
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by Comfy.n View Post
Yes, it's amazing.

BTW, I can see the original prompt under PR edits:

Attachment 221646
Wonderful. I will edit my post to reflect the actual wording for both.

Thank you!

Last edited by KevinH; Today at 04:28 PM.
KevinH is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Errors Using plugins with Sigil increase Plugins 3 03-06-2020 05:08 AM
Sigil Plugins! DiapDealer Sigil 33 03-19-2016 08:18 PM
Sigil Plugins (for Sigil 0.7xx Alpha) DiapDealer Sigil 15 09-27-2014 05:21 PM
Sigil translations: how to generate .qm from .ts? yucca Sigil 8 04-10-2013 09:25 PM
Have sigil not generate inline CSS Hellmark Sigil 3 01-27-2012 01:16 PM


All times are GMT -4. The time now is 04:42 PM.


MobileRead.com is a privately owned, operated and funded community.