Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 12-25-2017, 11:03 AM   #241
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
The launcher.py which is run by Sigil in a separate process does the following:
Code:
from opf_parser import Opf_Parser
from wrapper import Wrapper
from bookcontainer import BookContainer
from inputcontainer import InputContainer
from outputcontainer import OutputContainer
from validationcontainer import ValidationContainer
So all of the bk functions for your plugin.py should already be imported and the the launcher.py does the following:

Code:
def launch(self):
        script_module = self.script_module
        script_type = self.script_type
        container = self.container
        sys.stdout = SavedStream(sys.stdout, 'stdout', self)
        sys.stderr = SavedStream(sys.stderr, 'stderr', self)
        try:
            target_script = __import__(script_module)
            self.exitcode = target_script.run(container)
            sys.stdout = sys.stdout.stream
            sys.stderr = sys.stderr.stream
        except Exception as e:
            sys.stderr.write(traceback.format_exc())
            sys.stderr.write("Error: %s\n" % e)
            sys.stdout = sys.stdout.stream
            sys.stderr = sys.stderr.stream
            self.exitcode = -1
            pass
So anything you import in your plugin.py should be imported as well before its run method is invoked.

Some others can be imported since they are separate modules like compatibility_utils.py, epub_utils.py, quickparser.py, preferences.py exist and etc.
KevinH is online now   Reply With Quote
Old 12-25-2017, 11:10 AM   #242
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
I think I see what you are asking. And no, the correct container object for your container type is passed in to run() so it will not be available until then unless you can see/find the container object in the environment someplace. So you would need to have any import that needs bk done inside the run() method.

Last edited by KevinH; 12-25-2017 at 11:20 AM.
KevinH is online now   Reply With Quote
Advert
Old 12-25-2017, 11:56 AM   #243
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
I think I see what you are asking. And no, the correct container object for your container type is passed in to run() so it will not be available until then unless you can see/find the container object in the environment someplace. So you would need to have any import that needs bk done inside the run() method.
Thanks for your detailed response.
Doitsu is offline   Reply With Quote
Old 01-01-2018, 03:19 PM   #244
BeckyEbook
Guru
BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.
 
BeckyEbook's Avatar
 
Posts: 843
Karma: 3335974
Join Date: Jan 2017
Location: Poland
Device: Various
Precise offset

I need a precise offset for the found error in the validation plugin.
Is it possible?

plugin.py line 59:
Code:
bk.add_extended_result("error",escape(filename), linenumber, None, 'Becky Error #01: ' + msg)
Attached Files
File Type: epub SampleFile.epub (1.8 KB, 756 views)
File Type: zip BeckySample.zip (1.5 KB, 569 views)

Last edited by BeckyEbook; 01-01-2018 at 04:19 PM. Reason: Working sample
BeckyEbook is offline   Reply With Quote
Old 01-01-2018, 04:06 PM   #245
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,599
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by BeckyEbook View Post
I need a precise offset for the found error in the validation plugin.
Is it possible?

plugin.py line 59:
Code:
bk.add_result(escape(filename), linenumber, None, 'Becky Error #01: ' + message)
Which validation plugin? FlightCrew or EpubCheck?

EDIT: Nevermind. I see you're asking for a validation plugin of your own.

Last edited by DiapDealer; 01-01-2018 at 04:09 PM.
DiapDealer is online now   Reply With Quote
Advert
Old 01-01-2018, 04:07 PM   #246
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by BeckyEbook View Post
I need a precise offset for the found error in the validation plugin.
Is it possible?

plugin.py line 59:
Code:
bk.add_result(escape(filename), linenumber, None, 'Becky Error #01: ' + message)
When KevinH added validation support, he kindly provided the required code, which I slightly updated.

Code:
iswindows = sys.platform.startswith('win')
# code provided by KevinH
def generate_line_offsets(s):
    offlst = [0]
    i = s.find('\n', 0)
    while i >= 0:
        offlst.append(i)
        i = s.find('\n', i + 1)
    return offlst

# code provided by KevinH
def charoffset(line, col, offlst):
    coffset = None
    if iswindows:
        coffset = offlst[line-1]  + 2 + (col - 1) - line
    else:
        coffset = offlst[line-1]  + 1 + (col - 1)
    if line == 1:
        coffset -= 1
    return coffset
Basically, you run generate_line_offsets(s) once with the text of the HTML file as the input and then use this value together with line and column numbers as the input for charoffset(line, col, offlst).
To see it in action, have a look at the epubcheck plugin code and the Regex tester plugin code, which is somewhat similar to your plugin.
Doitsu is offline   Reply With Quote
Old 01-01-2018, 04:20 PM   #247
BeckyEbook
Guru
BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.
 
BeckyEbook's Avatar
 
Posts: 843
Karma: 3335974
Join Date: Jan 2017
Location: Poland
Device: Various
Thank you very much.
I really do not know how I could have missed the RegEx tester plugin.
BeckyEbook is offline   Reply With Quote
Old 01-01-2018, 04:24 PM   #248
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
Alternatively, you could do either of the following:

1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file.

OR

2. extract the exact string that contains the error and then use regular expressions to search the original html file to get the line and column or offset of the offending string in the original file
KevinH is online now   Reply With Quote
Old 01-01-2018, 04:24 PM   #249
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by BeckyEbook View Post
Thank you very much.
I really do not know how I could have missed the RegEx tester plugin.
It was never officially released, because it's a proof of concept plugin and of little use to most Sigil users.
Doitsu is offline   Reply With Quote
Old 01-01-2018, 04:27 PM   #250
BeckyEbook
Guru
BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.
 
BeckyEbook's Avatar
 
Posts: 843
Karma: 3335974
Join Date: Jan 2017
Location: Poland
Device: Various
I see the potential in this example, so I'm excited.

Quote:
Originally Posted by KevinH View Post
2. extract the exact string that contains the error and then use regular expressions to search the original html file to get the line and column or offset of the offending string in the original file
I thought about it, but my attempts were unsuccessful.

Last edited by BeckyEbook; 01-01-2018 at 04:29 PM.
BeckyEbook is offline   Reply With Quote
Old 01-01-2018, 04:57 PM   #251
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file.
I don't think that this method is documented in the Plugin Framework Guide. With the usual HTML parsers, I'd use the following code to find the first paragraph tag in a blank epub2 file:


Code:
from sigil_bs4 import BeautifulSoup
 
def run(bk):
    html = bk.readfile('Section0001.xhtml')
    soup = BeautifulSoup(html, 'html5lib')
    first_para = soup.find('p')
        
    return 0
How would I need to change the code to get the offset value for the first paragraph with gumbo?
Doitsu is offline   Reply With Quote
Old 01-01-2018, 05:15 PM   #252
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
You would have to parse the file using the gumbo bs4 adapter, the each node of the parse tree is given extra information fields:

Code:
def _add_source_info(obj, original_text, start_pos, end_pos):
    obj.original = _fromutf8(bytes(original_text))
    obj.line = start_pos.line
    obj.col = start_pos.column
    obj.offset = start_pos.offset
    if end_pos:
        obj.end_line = end_pos.line
        obj.end_col = end_pos.column
        obj.end_offset = end_pos.offset
See:

https://github.com/Sigil-Ebook/Sigil...bs4_adapter.py

And from the testme3 plugin posted at the start of this thread is how to use the gumbo parser:

Code:
# examples for using the bs4/gumbo parser to process xhtml
    print("\nExercising: the gumbo bs4 adapter")
    import sigil_gumbo_bs4_adapter as gumbo_bs4
    samp = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en-US">
<head><title>testing & entities</title></head>
<body>
  <p class="first second">this&nbsp;is*the*<i><b>copyright</i></b> symbol "&copy;"</p>
  <p xmlns:xlink="http://www.w3.org/xlink" class="second" xlink:href="http://www.ggogle.com">this used to test atribute namespaces</p>
</body>
</html>
"""
    soup = gumbo_bs4.parse(samp)
    for node in soup.find_all(attrs={'class':'second'}):
        print(node)
So you should be able to access them via node.line, node.col, and node.offset but I can not prove that now as all I have access to is my old iPad.

Please give that a try.
KevinH is online now   Reply With Quote
Old 01-01-2018, 06:00 PM   #253
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
BTW, the position information is for the utf-8 encoded source file. The offsets will not match the utf-16 QChars offsets inside Qt/Sigil but conversion is simple enough.
KevinH is online now   Reply With Quote
Old 01-02-2018, 09:01 AM   #254
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
I tested the Gumbo offset method, but it looks like the parser doesn't take the header into account when returning offsets. I also had to add 1 to the line number. The plugin works with a blank epub2 book but not with "real books."

BTW, I used the following plugin code:

Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
from xml.sax.saxutils import escape
import sigil_gumbo_bs4_adapter as gumbo_bs4
 
def run(bk):
    for id_type, id in bk.selected_iter():
        filename =  os.path.basename(bk.id_to_href(id))
        html = bk.readfile(id).replace('\r\n', '\n') 
        soup = gumbo_bs4.parse(html)
        
        for para in soup.find_all('p'):
            linenumber = para.line + 1
            colnumber = para.col
            offset = para.offset + 39
            message = escape(str(para)).replace('"', "&quot;")
            bk.add_extended_result('info', filename, linenumber, offset, 'Line: ' + str(linenumber) + ' Col: ' + str(colnumber) + ' Gumbo method: ' + message)
        
    return 0
        
def main():
    print('I reached main when I should not have\n')
    return -1

if __name__ == "__main__":
    sys.exit(main())


For your convenience, I've also attached the actual plugin.
Attached Files
File Type: zip GumboOffset.zip (1.1 KB, 544 views)
Doitsu is offline   Reply With Quote
Old 01-02-2018, 09:27 AM   #255
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
Yes, gumbo does not like the xml header at all. The easiest way to deal with it is to remove the xml header line if one exists before using gumbo. We do this inside of Sigil itself and then adjust the line numbers and offsets accordingly if interested in error positions. It is easy to do with regular expressions.

I will play around with your sample plugin. Thanks for coding it up!
KevinH is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Loading Plugin in development Sladd Development 6 06-17-2014 06:57 PM
Question for plugin development gurus DiapDealer Plugins 2 02-04-2012 11:33 PM
DR800 Plugin development for DR800/DR1000 yuri_b iRex Developer's Corner 0 09-18-2010 09:46 AM
Device plugin development reader42 Plugins 10 03-29-2010 12:39 PM
Calibre plugin development - Newbie problems minstrel Plugins 5 04-12-2009 12:44 PM


All times are GMT -4. The time now is 07:17 PM.


MobileRead.com is a privately owned, operated and funded community.