Quote:
Originally Posted by KevinH
1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file.
|
I don't think that this method is documented in the Plugin Framework Guide. With the usual HTML parsers, I'd use the following code to find the first paragraph tag in a blank epub2 file:
Code:
from sigil_bs4 import BeautifulSoup
def run(bk):
html = bk.readfile('Section0001.xhtml')
soup = BeautifulSoup(html, 'html5lib')
first_para = soup.find('p')
return 0
How would I need to change the code to get the offset value for the first paragraph with gumbo?