View Single Post
Old 03-29-2020, 03:49 AM   #1
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Searching within tags

Quote:
Originally Posted by carmenchu View Post
Well: so far, in 'non greedy' mode,
(?<=\>)\b([^<]+)(?=\</) selects between tags, not nested
(?<=\>)\b([^<]+)(?=\<) skips tags.
Useful when the mouse gets temperamental, and one wishes to manually extract/move some text.
for the Sigil User Guide and the links to regex references
If you have basic programming skills, you could also write an ad-hoc Sigil plugin using the BeautifulSoup library, which is bundled with Sigil, to manipulate tags. (The Sigil API documentation is here.)
This will save you the hassle of coming up with complex regular expressions.

For example the following minimal plugin code:

Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
from sigil_bs4 import BeautifulSoup

def run(bk):

    # get all html files
    for (html_id, href) in bk.text_iter():
        file_name = os.path.basename(href)
        html = bk.readfile(html_id)
        
        # convert html to soup
        soup = BeautifulSoup(html, 'html.parser')
        orig_html = str(soup)
        
        # get all span tags
        spans = soup.find_all('span')
        for span in spans:
            if 'class' in span.attrs:
                if 'Calibre13' in span['class']:
                    # remove class attribute
                    del span['class']
                    # change <span> to <b>
                    span.name = 'b'
                else:
                    # delete <span> tags with other classes
                    span.unwrap()
            else:
                # delete <span> tags w/o classes
                span.unwrap()

        # update file with changes
        if str(soup) != orig_html:
            bk.writefile(html_id, str(soup))
            print(file_name, 'updated')

    print('Done')
    return 0


will look for span tags with a Calibre13 class and replace them with <b> tags. (All other <span> tags will be deleted.)

Before:

Code:
<p>This should be <span class="Calibre6 Calibre13 Calibre2">bolded</span>. <span class="Calibre2">This span is redundant</span> <span>and this span should also be deleted.</span></p>
After:

Code:
<p>This should be <b>bolded</b>. This span is redundant and this span should also be deleted.</p>
If you want to test the plugin code:
  • Create a MyPlugin folder in the Sigil plugins folder
  • Save the plugin code as plugin.py in that folder.
  • Create a plugin.xml file with the following contents:
    Spoiler:
    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <plugin>
        <name>MyPlugin</name>
        <type>edit</type>
        <autostart>true</autostart>
        <author>carmenchu</author>
        <description>bs4 test</description>
        <engine>python3.4</engine>
        <version>0.0.1</version>
        <oslist>unx,win,osx</oslist>
    </plugin>

    and also save it in the MyPlugin folder.
(To run the plugin, select Plugins > Edit > MyPlugin.)

Last edited by Doitsu; 03-29-2020 at 03:55 AM.
Doitsu is offline   Reply With Quote