![]() |
#1 |
Connoisseur
![]() Posts: 55
Karma: 10
Join Date: Feb 2012
Device: none
|
Alt text bulk search replace function
Hi there, in order to comply with the EAA all epubs need a descriptive and accessible alt text. I haven't always paid as much attention to the alt texts that I've included in epubs, and I can expect to find myself having to redo some past work, as part of updating past epubs to fit the current spec.
I'm currently exploring using a custom GPT to make ChatGpt do the heavy lifting here, but can foresee that I'll be doing some mind-numbing search and replace functions, as some epubs I've made in the past will have upward of 100 images. Any way to automate this? For instance by side loading a csv file that has two columns: regex containing "<img" and {{filename}}, and the altered tag with the proper alt description. I've looked into the automated lists feature, but it seems more geared toward repetitive actions that are frequently done for all epubs. Is there a plugin for this perhaps, that you could point me toward? Thanks! |
![]() |
![]() |
![]() |
#2 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,724
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Access-Aide is a plugin that is meant to help with just that. It properly fills in empty titles inside head elements, it will properly add xml:lang to the opf package tag, and lang and xml:lang to html tags, and will give show you a two column gui table showing a thumbnail of every image side by side with the current alt text value. If no alt text value exists for an image, it will try to access the image's own metadata to extract one.
But if the image's metadata is missing, then you still have to hand type in a proper alt value into that table. Upon table completion, any changed alt values are updated in the text by the plugin. You might want to give it a try as creating a table first before trying to apply it takes as much time as directly editing the alt in the gui table and let the plugin apply it. Last edited by KevinH; Yesterday at 08:13 AM. |
![]() |
![]() |
![]() |
#3 |
Connoisseur
![]() Posts: 55
Karma: 10
Join Date: Feb 2012
Device: none
|
Hi KevinH, yeah I've worked with Access-Aide some, and it has some pretty good features. Thanks for the work you put into it, I can see myself using it a lot more over the coming years.
I suppose I wasn't clear enough on what I needed in this particular case. When epubs are coming back with a lot of different images, all with non-functioning alt tags, it would take a long time for me to describe them all by hand. So I wrote a custom GPT that looks at the image, describes it and any text that is in the image, forms the new img tag, and outputs a csv with two columns: one with a generic img tag holding the path and filename of the image (basically a <img.*?src="../Images/{{filename.ext}}".*?/> and another column with the new alt-tag enriched <img> expression. Now, all I need is a batch regex replace plugin. I've been trying to get the LLMs to give me one, but Sigil keeps saying it's not a valid sigil plugin. It has a plugin.xml and a plugin.py file, and should be a valid zip archive. I've included the xml below, perhaps someone could help me troubleshoot this? Code:
<?xml version="1.0" encoding="UTF-8"?> <plugin> <name>Regex Bulk Search Replace CSV</name> <type>edit</type> <author>Ryn</author> <description>Iterates through a user-provided list of search/replace commands using regex from a CSV file.</description> <autostart>true</autostart> <autoclose>false</autoclose> <engine>python3.4</engine> <version>0.3.1</version> <oslist>unx,win,osx</oslist> <sigil_version_min>0.9.10</sigil_version_min> <menu_entry> <label>Regex Bulk Search Replace CSV</label> <group>Tools</group> <menu_name>Plugins</menu_name> <shortcut></shortcut> </menu_entry> </plugin> |
![]() |
![]() |
![]() |
#4 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,724
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Your Chat GPT / LLM has hallucinated fields that do not exist in a Sigil plugin. Namely sigil_version_min, and menu_entry are not part of any Sigil plugin. God knows how poor the code it generated is.
Why not have your chat gpt program properly add the alt text to the image's own metadata. That way Access-Aide can access it and handle the replacement automatically. Or alternatively use Python function replace (new in 2.5.0) and in your python replace function when count == 1, have it read and store in your table into a python dict then just look up the capture string in the table to generate its replacement. That is one of the reasons the python function replace was added. Be aware of the need to escape internal quotes, <, >, & chars inside the alt text as well. And FWIW, I personally would not trust any image description from a Chat GPT that has not "read the book" so to speak. And probably would not trust it even if it did. Using Google's AI responses as a gauge, the false or failure rate is over 50%. Worse than worthless. Try asking the same question in slightly different way and get completely opposite responses. Most AI's do not even understand the concept of version numbers. For example, according to Google's AI aria role tags are legal in epub2 but of course the concept of a version 2 is lost on the AI. Last edited by KevinH; Yesterday at 10:04 AM. |
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 838
Karma: 2657572
Join Date: Jan 2017
Location: Poland
Device: Various
|
@KevinH is right, of course, but I will be the last person to discourage people from writing their own plugins.
My advice: 1. First of all, read the plugin documentation, especially the first chapter titled "The Anatomy of a Plugin". Link: https://github.com/Sigil-Ebook/Sigil...ork_rev15.epub 2. Using regex for this purpose is a bit dangerous. The best would be a CSV file as simple as possible: image file name, alternative text. 3. You will already find the key functions in the Access-Aide plugin. 4. Using Python functions is indeed tempting, although I think I would prefer a plugin. 5. I would be happy to test a working version. |
![]() |
![]() |
![]() |
#6 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,724
Karma: 5703586
Join Date: Nov 2009
Device: many
|
FWIW,
You can use Regex to capture all img tags and use a capture group to extract a file name. A full book path would be better in case of duplicate files names with different images in different folders (eg. chapter1/figure1.png vs chapter2/figure1.png) but probably overkill. Then looking up the extracted file name in your python dict (read in from csv once at the beginning) and adding in alt to create the replacement in Python Function Replace, should be easier than a full plugin and quite robust. But to each their own approach. I replacement table could be a useful generic python function find replace tool to have in a users bag of tools. Last edited by KevinH; Yesterday at 12:05 PM. |
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() Posts: 55
Karma: 10
Join Date: Feb 2012
Device: none
|
Thanks, adding the generated image descriptions to the metadata seems a nice bypass. I'm not too crazy about the quality of the descriptions so far, but I suppose with a little tweaking they can be something of a starting point.
I'll have to read up on how to properly do the plugin metadata. I've started out by copying/modifying the xml structure of existing and working plugins, but that didn't work either. Perhaps I'm simply missing an icon file, seeing as most plugins seem to have one. |
![]() |
![]() |
![]() |
#8 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,724
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Please note a Sigil plugin vs a calibre plugin is a very different beast. As are other program's python plugins. Unless Chat GPT has read, unzipped and grokked all of the plugins in our Sigil plugin index, it would just be guessing about the format of the plugin.xml based on other plugin formats not specific to Sigil.
|
![]() |
![]() |
![]() |
#9 |
Connoisseur
![]() Posts: 55
Karma: 10
Join Date: Feb 2012
Device: none
|
I used Gemini for this, as I've noticed it to be better with python and coding in general. Given it https://www.mobileread.com/forums/sh...d.php?t=251452 to ingest, so it should be an expert 8)
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
how to replace text with Search and Replace with regex on Calibre | darrnih | ePub | 2 | 04-02-2024 02:10 AM |
Help with Search & Replace function and fi/fl ligatures | raghiid | Conversion | 0 | 03-21-2021 07:36 AM |
Bulk metadata Search/Replace: template function question | meghane_e | Library Management | 3 | 01-24-2019 09:32 PM |
Edit metadata in bulk vs search and replace | inl1ner | Library Management | 6 | 07-14-2014 06:58 PM |
Bulk search and replace operations - question | SFD1968 | Calibre | 1 | 03-01-2013 09:23 AM |