View Single Post
Old 01-01-2018, 04:57 PM   #251
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,738
Karma: 24031403
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file.
I don't think that this method is documented in the Plugin Framework Guide. With the usual HTML parsers, I'd use the following code to find the first paragraph tag in a blank epub2 file:


Code:
from sigil_bs4 import BeautifulSoup
 
def run(bk):
    html = bk.readfile('Section0001.xhtml')
    soup = BeautifulSoup(html, 'html5lib')
    first_para = soup.find('p')
        
    return 0
How would I need to change the code to get the offset value for the first paragraph with gumbo?
Doitsu is offline   Reply With Quote