Plugin Development - Page 17

KevinH · 12-25-2017, 12:03 PM

The launcher.py which is run by Sigil in a separate process does the following:

Code:

from opf_parser import Opf_Parser
from wrapper import Wrapper
from bookcontainer import BookContainer
from inputcontainer import InputContainer
from outputcontainer import OutputContainer
from validationcontainer import ValidationContainer

So all of the bk functions for your plugin.py should already be imported and the the launcher.py does the following:

Code:

def launch(self):
        script_module = self.script_module
        script_type = self.script_type
        container = self.container
        sys.stdout = SavedStream(sys.stdout, 'stdout', self)
        sys.stderr = SavedStream(sys.stderr, 'stderr', self)
        try:
            target_script = __import__(script_module)
            self.exitcode = target_script.run(container)
            sys.stdout = sys.stdout.stream
            sys.stderr = sys.stderr.stream
        except Exception as e:
            sys.stderr.write(traceback.format_exc())
            sys.stderr.write("Error: %s\n" % e)
            sys.stdout = sys.stdout.stream
            sys.stderr = sys.stderr.stream
            self.exitcode = -1
            pass

So anything you import in your plugin.py should be imported as well before its run method is invoked.

Some others can be imported since they are separate modules like compatibility_utils.py, epub_utils.py, quickparser.py, preferences.py exist and etc.

KevinH · 12-25-2017, 12:10 PM

I think I see what you are asking. And no, the correct container object for your container type is passed in to run() so it will not be available until then unless you can see/find the container object in the environment someplace. So you would need to have any import that needs bk done inside the run() method.

Doitsu · 12-25-2017, 12:56 PM

Quote:

Originally Posted by KevinH

I think I see what you are asking. And no, the correct container object for your container type is passed in to run() so it will not be available until then unless you can see/find the container object in the environment someplace. So you would need to have any import that needs bk done inside the run() method.

Thanks for your detailed response.

BeckyEbook · 01-01-2018, 04:19 PM

I need a precise offset for the found error in the validation plugin.
Is it possible?

plugin.py line 59:

Code:

bk.add_extended_result("error",escape(filename), linenumber, None, 'Becky Error #01: ' + msg)

DiapDealer · 01-01-2018, 05:06 PM

Quote:

Originally Posted by BeckyEbook

I need a precise offset for the found error in the validation plugin.
Is it possible?

plugin.py line 59:

Code:

bk.add_result(escape(filename), linenumber, None, 'Becky Error #01: ' + message)

Which validation plugin? FlightCrew or EpubCheck?

EDIT: Nevermind. I see you're asking for a validation plugin of your own.

Doitsu · 01-01-2018, 05:07 PM

Quote:

Originally Posted by BeckyEbook

I need a precise offset for the found error in the validation plugin.
Is it possible?

plugin.py line 59:

Code:

bk.add_result(escape(filename), linenumber, None, 'Becky Error #01: ' + message)

When KevinH added validation support, he kindly provided the required code, which I slightly updated.

Code:

iswindows = sys.platform.startswith('win')
# code provided by KevinH
def generate_line_offsets(s):
    offlst = [0]
    i = s.find('\n', 0)
    while i >= 0:
        offlst.append(i)
        i = s.find('\n', i + 1)
    return offlst

# code provided by KevinH
def charoffset(line, col, offlst):
    coffset = None
    if iswindows:
        coffset = offlst[line-1]  + 2 + (col - 1) - line
    else:
        coffset = offlst[line-1]  + 1 + (col - 1)
    if line == 1:
        coffset -= 1
    return coffset

Basically, you run generate_line_offsets(s) once with the text of the HTML file as the input and then use this value together with line and column numbers as the input for charoffset(line, col, offlst).
To see it in action, have a look at the epubcheck plugin code and the Regex tester plugin code, which is somewhat similar to your plugin.

BeckyEbook · 01-01-2018, 05:20 PM

Thank you very much.
I really do not know how I could have missed the RegEx tester plugin.

KevinH · 01-01-2018, 05:24 PM

Alternatively, you could do either of the following:

1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file.

OR

2. extract the exact string that contains the error and then use regular expressions to search the original html file to get the line and column or offset of the offending string in the original file

Doitsu · 01-01-2018, 05:24 PM

Quote:

Originally Posted by BeckyEbook

Thank you very much.
I really do not know how I could have missed the RegEx tester plugin.

It was never officially released, because it's a proof of concept plugin and of little use to most Sigil users.

BeckyEbook · 01-01-2018, 05:27 PM

I see the potential in this example, so I'm excited.

Quote:

Originally Posted by KevinH

2. extract the exact string that contains the error and then use regular expressions to search the original html file to get the line and column or offset of the offending string in the original file

I thought about it, but my attempts were unsuccessful.

Doitsu · 01-01-2018, 05:57 PM

Quote:

Originally Posted by KevinH

1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file.

I don't think that this method is documented in the Plugin Framework Guide. With the usual HTML parsers, I'd use the following code to find the first paragraph tag in a blank epub2 file:

Code:

from sigil_bs4 import BeautifulSoup
 
def run(bk):
    html = bk.readfile('Section0001.xhtml')
    soup = BeautifulSoup(html, 'html5lib')
    first_para = soup.find('p')
        
    return 0

How would I need to change the code to get the offset value for the first paragraph with gumbo?

KevinH · 01-01-2018, 06:15 PM

You would have to parse the file using the gumbo bs4 adapter, the each node of the parse tree is given extra information fields:

Code:

def _add_source_info(obj, original_text, start_pos, end_pos):
    obj.original = _fromutf8(bytes(original_text))
    obj.line = start_pos.line
    obj.col = start_pos.column
    obj.offset = start_pos.offset
    if end_pos:
        obj.end_line = end_pos.line
        obj.end_col = end_pos.column
        obj.end_offset = end_pos.offset

See:

https://github.com/Sigil-Ebook/Sigil...bs4_adapter.py

And from the testme3 plugin posted at the start of this thread is how to use the gumbo parser:

Code:

# examples for using the bs4/gumbo parser to process xhtml
    print("\nExercising: the gumbo bs4 adapter")
    import sigil_gumbo_bs4_adapter as gumbo_bs4
    samp = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en-US">
<head><title>testing & entities</title></head>
<body>
  <p class="first second">this&nbsp;is*the*<i><b>copyright</i></b> symbol "&copy;"</p>
  <p xmlns:xlink="http://www.w3.org/xlink" class="second" xlink:href="http://www.ggogle.com">this used to test atribute namespaces</p>
</body>
</html>
"""
    soup = gumbo_bs4.parse(samp)
    for node in soup.find_all(attrs={'class':'second'}):
        print(node)

So you should be able to access them via node.line, node.col, and node.offset but I can not prove that now as all I have access to is my old iPad.

Please give that a try.

KevinH · 01-01-2018, 07:00 PM

BTW, the position information is for the utf-8 encoded source file. The offsets will not match the utf-16 QChars offsets inside Qt/Sigil but conversion is simple enough.

Doitsu · 01-02-2018, 10:01 AM

I tested the Gumbo offset method, but it looks like the parser doesn't take the header into account when returning offsets. I also had to add 1 to the line number. The plugin works with a blank epub2 book but not with "real books."

BTW, I used the following plugin code:

Spoiler:

For your convenience, I've also attached the actual plugin.

KevinH · 01-02-2018, 10:27 AM

Yes, gumbo does not like the xml header at all. The easiest way to deal with it is to remove the xml header line if one exists before using gumbo. We do this inside of Sigil itself and then adjust the line numbers and offsets accordingly if interested in error positions. It is easy to do with regular expressions.

I will play around with your sample plugin. Thanks for coding it up!

12-25-2017, 12:10 PM	#242
KevinH Sigil Developer Posts: 9,073 Karma: 6361556 Join Date: Nov 2009 Device: many	I think I see what you are asking. And no, the correct container object for your container type is passed in to run() so it will not be available until then unless you can see/find the container object in the environment someplace. So you would need to have any import that needs bk done inside the run() method. Last edited by KevinH; 12-25-2017 at 12:20 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Loading Plugin in development	Sladd	Development	6	06-17-2014 07:57 PM
Question for plugin development gurus	DiapDealer	Plugins	2	02-05-2012 12:33 AM
DR800 Plugin development for DR800/DR1000	yuri_b	iRex Developer's Corner	0	09-18-2010 10:46 AM
Device plugin development	reader42	Plugins	10	03-29-2010 01:39 PM
Calibre plugin development - Newbie problems	minstrel	Plugins	5	04-12-2009 01:44 PM

01-01-2018, 05:20 PM	#247
BeckyEbook Guru Posts: 900 Karma: 3501166 Join Date: Jan 2017 Location: Poland Device: Various	Thank you very much. I really do not know how I could have missed the RegEx tester plugin.

01-01-2018, 05:24 PM	#248
KevinH Sigil Developer Posts: 9,073 Karma: 6361556 Join Date: Nov 2009 Device: many	Alternatively, you could do either of the following: 1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file. OR 2. extract the exact string that contains the error and then use regular expressions to search the original html file to get the line and column or offset of the offending string in the original file

01-01-2018, 07:00 PM	#253
KevinH Sigil Developer Posts: 9,073 Karma: 6361556 Join Date: Nov 2009 Device: many	BTW, the position information is for the utf-8 encoded source file. The offsets will not match the utf-16 QChars offsets inside Qt/Sigil but conversion is simple enough.

01-02-2018, 10:27 AM	#255
KevinH Sigil Developer Posts: 9,073 Karma: 6361556 Join Date: Nov 2009 Device: many	Yes, gumbo does not like the xml header at all. The easiest way to deal with it is to remove the xml header line if one exists before using gumbo. We do this inside of Sigil itself and then adjust the line numbers and offsets accordingly if interested in error positions. It is easy to do with regular expressions. I will play around with your sample plugin. Thanks for coding it up!

Advert

Advert