|  12-25-2017, 11:03 AM | #241 | 
| Sigil Developer            Posts: 9,071 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			The launcher.py which is run by Sigil in a separate process does the following: Code: from opf_parser import Opf_Parser from wrapper import Wrapper from bookcontainer import BookContainer from inputcontainer import InputContainer from outputcontainer import OutputContainer from validationcontainer import ValidationContainer Code: def launch(self):
        script_module = self.script_module
        script_type = self.script_type
        container = self.container
        sys.stdout = SavedStream(sys.stdout, 'stdout', self)
        sys.stderr = SavedStream(sys.stderr, 'stderr', self)
        try:
            target_script = __import__(script_module)
            self.exitcode = target_script.run(container)
            sys.stdout = sys.stdout.stream
            sys.stderr = sys.stderr.stream
        except Exception as e:
            sys.stderr.write(traceback.format_exc())
            sys.stderr.write("Error: %s\n" % e)
            sys.stdout = sys.stdout.stream
            sys.stderr = sys.stderr.stream
            self.exitcode = -1
            passSome others can be imported since they are separate modules like compatibility_utils.py, epub_utils.py, quickparser.py, preferences.py exist and etc. | 
|   |   | 
|  12-25-2017, 11:10 AM | #242 | 
| Sigil Developer            Posts: 9,071 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			I think I see what you are asking.  And no, the correct container object for your container type is passed in to run() so it will not be available until then unless you can see/find the container object in the environment someplace.  So you would need to have any import that needs bk done inside the run() method.
		 Last edited by KevinH; 12-25-2017 at 11:20 AM. | 
|   |   | 
| Advert | |
|  | 
|  12-25-2017, 11:56 AM | #243 | |
| Grand Sorcerer            Posts: 5,763 Karma: 24088559 Join Date: Dec 2010 Device: Kindle PW2 | Quote: 
 | |
|   |   | 
|  01-01-2018, 03:19 PM | #244 | 
| Guru            Posts: 899 Karma: 3501166 Join Date: Jan 2017 Location: Poland Device: Various | 
				
				Precise offset
			 
			
			I need a precise offset for the found error in the validation plugin. Is it possible? plugin.py line 59: Code: bk.add_extended_result("error",escape(filename), linenumber, None, 'Becky Error #01: ' + msg)Last edited by BeckyEbook; 01-01-2018 at 04:19 PM. Reason: Working sample | 
|   |   | 
|  01-01-2018, 04:06 PM | #245 | |
| Grand Sorcerer            Posts: 28,882 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | Quote: 
 EDIT: Nevermind. I see you're asking for a validation plugin of your own. Last edited by DiapDealer; 01-01-2018 at 04:09 PM. | |
|   |   | 
| Advert | |
|  | 
|  01-01-2018, 04:07 PM | #246 | |
| Grand Sorcerer            Posts: 5,763 Karma: 24088559 Join Date: Dec 2010 Device: Kindle PW2 | Quote: 
 Code: iswindows = sys.platform.startswith('win')
# code provided by KevinH
def generate_line_offsets(s):
    offlst = [0]
    i = s.find('\n', 0)
    while i >= 0:
        offlst.append(i)
        i = s.find('\n', i + 1)
    return offlst
# code provided by KevinH
def charoffset(line, col, offlst):
    coffset = None
    if iswindows:
        coffset = offlst[line-1]  + 2 + (col - 1) - line
    else:
        coffset = offlst[line-1]  + 1 + (col - 1)
    if line == 1:
        coffset -= 1
    return coffsetTo see it in action, have a look at the epubcheck plugin code and the Regex tester plugin code, which is somewhat similar to your plugin. | |
|   |   | 
|  01-01-2018, 04:20 PM | #247 | 
| Guru            Posts: 899 Karma: 3501166 Join Date: Jan 2017 Location: Poland Device: Various | 
			
			Thank you very much. I really do not know how I could have missed the RegEx tester plugin. | 
|   |   | 
|  01-01-2018, 04:24 PM | #248 | 
| Sigil Developer            Posts: 9,071 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Alternatively, you could do either of the following: 1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file. OR 2. extract the exact string that contains the error and then use regular expressions to search the original html file to get the line and column or offset of the offending string in the original file | 
|   |   | 
|  01-01-2018, 04:24 PM | #249 | 
| Grand Sorcerer            Posts: 5,763 Karma: 24088559 Join Date: Dec 2010 Device: Kindle PW2 | |
|   |   | 
|  01-01-2018, 04:27 PM | #250 | 
| Guru            Posts: 899 Karma: 3501166 Join Date: Jan 2017 Location: Poland Device: Various | 
			
			I see the potential in this example, so I'm excited. I thought about it, but my attempts were unsuccessful. Last edited by BeckyEbook; 01-01-2018 at 04:29 PM. | 
|   |   | 
|  01-01-2018, 04:57 PM | #251 | |
| Grand Sorcerer            Posts: 5,763 Karma: 24088559 Join Date: Dec 2010 Device: Kindle PW2 | Quote: 
 Code: from sigil_bs4 import BeautifulSoup
 
def run(bk):
    html = bk.readfile('Section0001.xhtml')
    soup = BeautifulSoup(html, 'html5lib')
    first_para = soup.find('p')
        
    return 0 | |
|   |   | 
|  01-01-2018, 05:15 PM | #252 | 
| Sigil Developer            Posts: 9,071 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			You would have to parse the file using the gumbo bs4 adapter, the  each node of the parse tree is given extra information fields: Code: def _add_source_info(obj, original_text, start_pos, end_pos):
    obj.original = _fromutf8(bytes(original_text))
    obj.line = start_pos.line
    obj.col = start_pos.column
    obj.offset = start_pos.offset
    if end_pos:
        obj.end_line = end_pos.line
        obj.end_col = end_pos.column
        obj.end_offset = end_pos.offsethttps://github.com/Sigil-Ebook/Sigil...bs4_adapter.py And from the testme3 plugin posted at the start of this thread is how to use the gumbo parser: Code: # examples for using the bs4/gumbo parser to process xhtml
    print("\nExercising: the gumbo bs4 adapter")
    import sigil_gumbo_bs4_adapter as gumbo_bs4
    samp = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en-US">
<head><title>testing & entities</title></head>
<body>
  <p class="first second">this is*the*<i><b>copyright</i></b> symbol "©"</p>
  <p xmlns:xlink="http://www.w3.org/xlink" class="second" xlink:href="http://www.ggogle.com">this used to test atribute namespaces</p>
</body>
</html>
"""
    soup = gumbo_bs4.parse(samp)
    for node in soup.find_all(attrs={'class':'second'}):
        print(node)Please give that a try. | 
|   |   | 
|  01-01-2018, 06:00 PM | #253 | 
| Sigil Developer            Posts: 9,071 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			BTW, the position information is for the utf-8 encoded source file.  The offsets will not match the  utf-16 QChars offsets  inside Qt/Sigil but conversion is simple enough.
		 | 
|   |   | 
|  01-02-2018, 09:01 AM | #254 | 
| Grand Sorcerer            Posts: 5,763 Karma: 24088559 Join Date: Dec 2010 Device: Kindle PW2 | 
			
			I tested the Gumbo offset method, but it looks like the parser doesn't take the header into account when returning offsets. I also had to add 1 to the line number. The plugin works with a blank epub2 book but not with "real books." BTW, I used the following plugin code: Spoiler: 
 For your convenience, I've also attached the actual plugin. | 
|   |   | 
|  01-02-2018, 09:27 AM | #255 | 
| Sigil Developer            Posts: 9,071 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Yes, gumbo does not like the xml header at all.  The easiest way to deal with it is to remove the xml header line if one exists before using gumbo.  We do this inside of Sigil itself and then adjust the line numbers and offsets accordingly if interested in error positions.  It is easy to do with regular expressions. I will play around with your sample plugin. Thanks for coding it up! | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Loading Plugin in development | Sladd | Development | 6 | 06-17-2014 06:57 PM | 
| Question for plugin development gurus | DiapDealer | Plugins | 2 | 02-04-2012 11:33 PM | 
| DR800 Plugin development for DR800/DR1000 | yuri_b | iRex Developer's Corner | 0 | 09-18-2010 09:46 AM | 
| Device plugin development | reader42 | Plugins | 10 | 03-29-2010 12:39 PM | 
| Calibre plugin development - Newbie problems | minstrel | Plugins | 5 | 04-12-2009 12:44 PM |