MobileRead Forums - View Single Post

Doitsu · 01-01-2018, 05:57 PM

Quote:

Originally Posted by KevinH

1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file.

I don't think that this method is documented in the Plugin Framework Guide. With the usual HTML parsers, I'd use the following code to find the first paragraph tag in a blank epub2 file:

Code:

from sigil_bs4 import BeautifulSoup
 
def run(bk):
    html = bk.readfile('Section0001.xhtml')
    soup = BeautifulSoup(html, 'html5lib')
    first_para = soup.find('p')
        
    return 0

How would I need to change the code to get the offset value for the first paragraph with gumbo?