![]() |
#241 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
|
The launcher.py which is run by Sigil in a separate process does the following:
Code:
from opf_parser import Opf_Parser from wrapper import Wrapper from bookcontainer import BookContainer from inputcontainer import InputContainer from outputcontainer import OutputContainer from validationcontainer import ValidationContainer Code:
def launch(self): script_module = self.script_module script_type = self.script_type container = self.container sys.stdout = SavedStream(sys.stdout, 'stdout', self) sys.stderr = SavedStream(sys.stderr, 'stderr', self) try: target_script = __import__(script_module) self.exitcode = target_script.run(container) sys.stdout = sys.stdout.stream sys.stderr = sys.stderr.stream except Exception as e: sys.stderr.write(traceback.format_exc()) sys.stderr.write("Error: %s\n" % e) sys.stdout = sys.stdout.stream sys.stderr = sys.stderr.stream self.exitcode = -1 pass Some others can be imported since they are separate modules like compatibility_utils.py, epub_utils.py, quickparser.py, preferences.py exist and etc. |
![]() |
![]() |
![]() |
#242 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
|
I think I see what you are asking. And no, the correct container object for your container type is passed in to run() so it will not be available until then unless you can see/find the container object in the environment someplace. So you would need to have any import that needs bk done inside the run() method.
Last edited by KevinH; 12-25-2017 at 11:20 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#243 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
|
|
![]() |
![]() |
![]() |
#244 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 843
Karma: 3335974
Join Date: Jan 2017
Location: Poland
Device: Various
|
Precise offset
I need a precise offset for the found error in the validation plugin.
Is it possible? plugin.py line 59: Code:
bk.add_extended_result("error",escape(filename), linenumber, None, 'Becky Error #01: ' + msg)
Last edited by BeckyEbook; 01-01-2018 at 04:19 PM. Reason: Working sample |
![]() |
![]() |
![]() |
#245 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,599
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
EDIT: Nevermind. I see you're asking for a validation plugin of your own. Last edited by DiapDealer; 01-01-2018 at 04:09 PM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#246 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Code:
iswindows = sys.platform.startswith('win') # code provided by KevinH def generate_line_offsets(s): offlst = [0] i = s.find('\n', 0) while i >= 0: offlst.append(i) i = s.find('\n', i + 1) return offlst # code provided by KevinH def charoffset(line, col, offlst): coffset = None if iswindows: coffset = offlst[line-1] + 2 + (col - 1) - line else: coffset = offlst[line-1] + 1 + (col - 1) if line == 1: coffset -= 1 return coffset To see it in action, have a look at the epubcheck plugin code and the Regex tester plugin code, which is somewhat similar to your plugin. |
|
![]() |
![]() |
![]() |
#247 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 843
Karma: 3335974
Join Date: Jan 2017
Location: Poland
Device: Various
|
Thank you very much.
I really do not know how I could have missed the RegEx tester plugin. |
![]() |
![]() |
![]() |
#248 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Alternatively, you could do either of the following:
1. Use gumbo with bs4 to do your parsing and searching with the sigil-gumbo adapter as each tag provides its starting offset in the file. OR 2. extract the exact string that contains the error and then use regular expressions to search the original html file to get the line and column or offset of the offending string in the original file |
![]() |
![]() |
![]() |
#249 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
|
![]() |
![]() |
![]() |
#250 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 843
Karma: 3335974
Join Date: Jan 2017
Location: Poland
Device: Various
|
I see the potential in this example, so I'm excited.
I thought about it, but my attempts were unsuccessful. Last edited by BeckyEbook; 01-01-2018 at 04:29 PM. |
![]() |
![]() |
![]() |
#251 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Code:
from sigil_bs4 import BeautifulSoup def run(bk): html = bk.readfile('Section0001.xhtml') soup = BeautifulSoup(html, 'html5lib') first_para = soup.find('p') return 0 |
|
![]() |
![]() |
![]() |
#252 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
|
You would have to parse the file using the gumbo bs4 adapter, the each node of the parse tree is given extra information fields:
Code:
def _add_source_info(obj, original_text, start_pos, end_pos): obj.original = _fromutf8(bytes(original_text)) obj.line = start_pos.line obj.col = start_pos.column obj.offset = start_pos.offset if end_pos: obj.end_line = end_pos.line obj.end_col = end_pos.column obj.end_offset = end_pos.offset https://github.com/Sigil-Ebook/Sigil...bs4_adapter.py And from the testme3 plugin posted at the start of this thread is how to use the gumbo parser: Code:
# examples for using the bs4/gumbo parser to process xhtml print("\nExercising: the gumbo bs4 adapter") import sigil_gumbo_bs4_adapter as gumbo_bs4 samp = """ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en-US"> <head><title>testing & entities</title></head> <body> <p class="first second">this is*the*<i><b>copyright</i></b> symbol "©"</p> <p xmlns:xlink="http://www.w3.org/xlink" class="second" xlink:href="http://www.ggogle.com">this used to test atribute namespaces</p> </body> </html> """ soup = gumbo_bs4.parse(samp) for node in soup.find_all(attrs={'class':'second'}): print(node) Please give that a try. |
![]() |
![]() |
![]() |
#253 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
|
BTW, the position information is for the utf-8 encoded source file. The offsets will not match the utf-16 QChars offsets inside Qt/Sigil but conversion is simple enough.
|
![]() |
![]() |
![]() |
#254 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
I tested the Gumbo offset method, but it looks like the parser doesn't take the header into account when returning offsets. I also had to add 1 to the line number. The plugin works with a blank epub2 book but not with "real books."
BTW, I used the following plugin code: Spoiler:
For your convenience, I've also attached the actual plugin. |
![]() |
![]() |
![]() |
#255 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,802
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Yes, gumbo does not like the xml header at all. The easiest way to deal with it is to remove the xml header line if one exists before using gumbo. We do this inside of Sigil itself and then adjust the line numbers and offsets accordingly if interested in error positions. It is easy to do with regular expressions.
I will play around with your sample plugin. Thanks for coding it up! |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Loading Plugin in development | Sladd | Development | 6 | 06-17-2014 06:57 PM |
Question for plugin development gurus | DiapDealer | Plugins | 2 | 02-04-2012 11:33 PM |
DR800 Plugin development for DR800/DR1000 | yuri_b | iRex Developer's Corner | 0 | 09-18-2010 09:46 AM |
Device plugin development | reader42 | Plugins | 10 | 03-29-2010 12:39 PM |
Calibre plugin development - Newbie problems | minstrel | Plugins | 5 | 04-12-2009 12:44 PM |