View Single Post
Old 01-02-2018, 09:01 AM   #254
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,746
Karma: 24032915
Join Date: Dec 2010
Device: Kindle PW2
I tested the Gumbo offset method, but it looks like the parser doesn't take the header into account when returning offsets. I also had to add 1 to the line number. The plugin works with a blank epub2 book but not with "real books."

BTW, I used the following plugin code:

Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
from xml.sax.saxutils import escape
import sigil_gumbo_bs4_adapter as gumbo_bs4
 
def run(bk):
    for id_type, id in bk.selected_iter():
        filename =  os.path.basename(bk.id_to_href(id))
        html = bk.readfile(id).replace('\r\n', '\n') 
        soup = gumbo_bs4.parse(html)
        
        for para in soup.find_all('p'):
            linenumber = para.line + 1
            colnumber = para.col
            offset = para.offset + 39
            message = escape(str(para)).replace('"', """)
            bk.add_extended_result('info', filename, linenumber, offset, 'Line: ' + str(linenumber) + ' Col: ' + str(colnumber) + ' Gumbo method: ' + message)
        
    return 0
        
def main():
    print('I reached main when I should not have\n')
    return -1

if __name__ == "__main__":
    sys.exit(main())


For your convenience, I've also attached the actual plugin.
Attached Files
File Type: zip GumboOffset.zip (1.1 KB, 547 views)
Doitsu is offline   Reply With Quote