![]() |
#256 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,808
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Hi Doitsu,
I have slightly modified your GumboOffset example to do what we do inside Sigil to make it more general Code:
import sigil_gumbo_bs4_adapter as gumbo_bs4 wspace = (" ", "\n", "\r", "\t", "\v" "\f") def preprocess(src): newsrc = src line_offset = 0; pos_offset = 0; n = len(src) if src.startswith("<?xml"): # remove any xml header line and trailing whitespace end = src.find('>',5) if end != -1: end = end + 1 while end < n and src[end:end+1] in wspace: if src[end:end+1] == "\n": line_offset += 1 end += 1 if (end < n): pos_offset = end newsrc = src[end:] return (newsrc, line_offset, pos_offset) def run(bk): for id_type, id in bk.selected_iter(): filename = os.path.basename(bk.id_to_href(id)) html = bk.readfile(id).replace('\r\n', '\n') (html, line_offset, pos_offset) = preprocess(html) soup = gumbo_bs4.parse(html) for para in soup.find_all('p'): linenumber = para.line + line_offset colnumber = para.col offset = para.offset + pos_offset message = escape(str(para)).replace('"', """) bk.add_extended_result('info', filename, linenumber, offset, 'Line: ' + str(linenumber) + ' Col: ' + str(colnumber) + ' Gumbo method: ' + message) return 0 def main(): print('I reached main when I should not have\n') return -1 if __name__ == "__main__": sys.exit(main()) Last edited by KevinH; 01-02-2018 at 10:26 AM. |
![]() |
![]() |
![]() |
#257 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,733
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
![]() Thanks for the updated code, I'll check it out tomorrow. |
|
![]() |
![]() |
![]() |
#258 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 846
Karma: 3341026
Join Date: Jan 2017
Location: Poland
Device: Various
|
From the fourth paragraph, the offset is shifted.
The problem is related to double-byte diacritics. Code:
<div> <p>test</p> <p>test</p> <p>test</p> <p xml:lang="es">información</p> <p>here</p> <p>here</p> <p>here</p> <p xml:lang="pl">żółtko</p> <p>end</p> <p>end</p> <p>end</p> </div> |
![]() |
![]() |
![]() |
#259 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,808
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Yes the offset gumbo records is a byte offset from a start of a utf-8 encoded file or string. The column number is "proper" as it is measured in unicode code points not in bytes. The solution is to use the routine previously posted by Doitsu to convert line and column numbers inside python to an offset in unicode codepoints if that is what you want. Offsets are hard to work with given they are encoding dependent. Whereas line and column given in codepoints should be easier to work with and convert to any encoding you like.
KevinH |
![]() |
![]() |
![]() |
#260 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,808
Karma: 6000000
Join Date: Nov 2009
Device: many
|
To make things even harder, a Qt QChar is a little endian utf-16 encoding which makes offsets harder to work with without defining exactly what the basis you are using!
The validation plugin should be passed the offset in unicode codepoints if you need exact positioning in the validation result window inside Sigil. The gumbo line and col info can be used to accurately determine the offset in codepoints If on the other hand you want to use offsets into python utf-8 bytestrings to extract things, the gumbo offsets can be used directly for that. Last edited by KevinH; 01-02-2018 at 12:17 PM. |
![]() |
![]() |
![]() |
#261 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 57
Karma: 600000
Join Date: Jan 2018
Device: Galaxy Tab S2
|
tk.withdraw
Hi,
i hope this is now my last issue transfering my stuff from win to mac. i have one self written plugin which isnt working correctly under mac os x, but works fine under win. I've already installed the activestate tcl and i've on ly a problem now with my own script. I've nailed that down to this: Code:
#!/usr/bin/env python # -*- coding: utf-8 -*- # target script from tkinter import * from tkinter import messagebox def run(bk): root = Tk() # root.withdraw() print('Start\n') if messagebox.askyesno("Testquestion", "Ok to go on?"): print('Middle\n') print('End\n') return 0 def main(): print ('I reached main when I should not have\n') return -1 if __name__ == "__main__": sys.exit(main()) ![]() What am i doing wrong? Greets Maui |
![]() |
![]() |
![]() |
#262 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,733
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
I don't know why your code doesn't work, but if all you need is a message box, you might as well use PyQt5, which is bundled with Sigil 0.9.8 and higher.:
Code:
#!/usr/bin/env python # -*- coding: utf-8 -*- import sys from PyQt5.QtWidgets import QApplication, QMessageBox def run(bk): print('Start\n') app = QApplication(sys.argv) msg = QMessageBox() msg.setWindowTitle("QMessageBox demo") msg.setText("This is a QMessageBox.") msg.setStandardButtons(QMessageBox.Ok | QMessageBox.Cancel) buttonClicked = msg.exec_() if buttonClicked == QMessageBox.Ok: print('\nYou clicked OK.') else: print('\nYou clicked Cancel.') print('\nEnd') return 0 def main(): print('I reached main when I should not have\n') return -1 if __name__ == "__main__": sys.exit(main()) |
![]() |
![]() |
![]() |
#263 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,808
Karma: 6000000
Join Date: Nov 2009
Device: many
|
On a Mac, python tk mainwindows do not automatically grab focus or come to the surface. They are often hidden under other Windows. You need to click on the Python launcher icon that gets added to the end of the Dock to force that window to the front and make it take focus.
There is a workaround in Python to force the main window to the front and grab focus. See my FolderIn or FolderOut or ePub3-itizer plugin code that uses this workaround for tk graphics to work as expected on a Mac See the needed code here: https://github.com/kevinhendricks/eP.../src/plugin.py the bulk of the Tk stuff starts near line 262. The added code is under the darwin if. Last edited by KevinH; 01-07-2018 at 11:33 AM. |
![]() |
![]() |
![]() |
#264 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 57
Karma: 600000
Join Date: Jan 2018
Device: Galaxy Tab S2
|
Hi,
thanks for your help, i successfully transferred now my Sigil setup completely from win 7 to mac High Sierra. Maui |
![]() |
![]() |
![]() |
#265 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 846
Karma: 3341026
Join Date: Jan 2017
Location: Poland
Device: Various
|
It’s me again.
Based on the Regex tester, the line number and offset works perfect for xhtml files. But ... How do I calculate line number for OPF (eg. in metadata, guide)? For example, I want to check if the language of the file is set to Polish and guide section. For missing element I need first line with open tag <metadata> or <guide>. If an error occurs – the exact location in the file. I've attached sample plugin + test file. |
![]() |
![]() |
![]() |
#266 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,733
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Code:
coffset = charoffset(linenumber, colnumber, offlst)
if filename == 'content.opf':
coffset += linenumber - 1
Last edited by Doitsu; 01-19-2018 at 07:49 AM. |
|
![]() |
![]() |
![]() |
#267 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@Becky...Regarding just getting the line number for the meta language identifier, you could perhaps put Kevin's code into a simple function and get the line number something like this:
Code:
def getOPFLineNumber(opf, search_text): opf_data = opf.splitlines() # split the opf string into separate lines of code # assign a line num to the search text when found linenum = '' for index, line in enumerate(opf_data): if search_text in line: linum = index + 1 linenum = str(linum) return(linenum) Code:
dc_language = None meta_language = tree.find('.//{http://purl.org/dc/elements/1.1/}language') if hasattr(meta_language, 'text'): dc_language = meta_language.text if dc_language: if dc_language != "pl": opf_data = bk.get_opf() # get all the opf data as a string linenumber = getOPFLineNumber(opf_data, "<dc:language>") message = "Language specified in metadata is other than 'pl' (Polish)" bk.add_extended_result("error", "content.opf", linenumber , 0, 'Becky Metadata #002' + ' -- ' + message) else: message = "Language not specified in metadata" bk.add_extended_result("error", "content.opf", 0, 0, 'Becky Metadata #001' + ' -- ' + message) Also, I know that using bk.get_opf() will certainly work for an Edit plugin but I'm not at all sure whether this bk method will be available for a Validation plugin like your plugin. If this is true then you could probably use xml.etree to get the opf file contents as a string instead of using bk.get_opf(). Last edited by slowsmile; 01-18-2018 at 09:58 PM. |
![]() |
![]() |
![]() |
#268 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 846
Karma: 3341026
Join Date: Jan 2017
Location: Poland
Device: Various
|
Thank you both. @Doitsu for a hint, @slowsmile for the solution.
That should be enough for me. That’s something I wanted, but I could not do it myself. I need one more thing: how to detect (via the plugin) if the file has been modified but not saved (exist * after filename). |
![]() |
![]() |
![]() |
#269 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,620
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Plugins work on copies of files. Meaning changes aren't incorporated into the existing plugin until after the plugin completes. There are some dictionaries in wrapper.py that can be accessed with that info, but that's considered bad form (somewhat) since unforeseen things can happen if those dictionary contents are accidentally modified incorrectly. |
|
![]() |
![]() |
![]() |
#270 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,733
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
I've got a related question: is there a built-in property that I can use to make Sigil display the dirty-flag asterisk even when no file was changed? (I know that I can simply add and remove a dummy file, but I'm wondering if there's a more elegant method using a built-in property.)
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Loading Plugin in development | Sladd | Development | 6 | 06-17-2014 06:57 PM |
Question for plugin development gurus | DiapDealer | Plugins | 2 | 02-04-2012 11:33 PM |
DR800 Plugin development for DR800/DR1000 | yuri_b | iRex Developer's Corner | 0 | 09-18-2010 09:46 AM |
Device plugin development | reader42 | Plugins | 10 | 03-29-2010 12:39 PM |
Calibre plugin development - Newbie problems | minstrel | Plugins | 5 | 04-12-2009 12:44 PM |