![]() |
#256 | |
Enthusiast
![]() Posts: 43
Karma: 10
Join Date: Oct 2008
Device: sony
|
Quote:
Last edited by dearleuk; 12-11-2024 at 12:07 PM. |
|
![]() |
![]() |
![]() |
#257 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,883
Karma: 6120478
Join Date: Nov 2009
Device: many
|
Hi All,
Okay, I tracked down the hang to the HTMLProcessor processHTML method that tries to "fix apostrophes in the wrong direction". In this case it does this by running the following command: Code:
CorrectText("Corrected apostrophes in wrong direction", r'[ ]?‘(?i)(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{\ 0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|e\ ll|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst| n|nd|neath|\ nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{\ 0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)', \ r' ’\1\2') I think we really need to get the author of this plugin involved to track this one down. In case anyone is interested, CorrectText is basically doing the following: newHtml, replacements = re.subn(pattern, replacement, html) but this is actually hanging so my guess is unbalanced single quotes of some sort but that is a whopper of a regular expression. Last edited by KevinH; 12-11-2024 at 11:48 AM. |
![]() |
![]() |
![]() |
#258 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,883
Karma: 6120478
Join Date: Nov 2009
Device: many
|
Okay, I have reduced the hang down to the following simpler test case:
Code:
import sys import re global html pattern = r'[ ]?‘(?i)(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|ell|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst|n|nd|neath|nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)' replacement = r' ’\1\2' # pattern = r" (’|')(re|ve|t|m|d|s|ll) " # replacement=r"\1\2 " html=''' <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head><meta name="charset" content="UTF-8"/> <meta name="viewport" content="width=-3, height=-4"/> <title></title> <link href="../Styles/main.css" type="text/css" rel="stylesheet"/> </head> <body class="fullpage"> <div class="cover"> <img id="coverimage" src="../Images/mycoverimagehere.jpg" alt="cover image"/> </div> </body> </html> ''' def doit(pat, repl): count = 0 flags = 0 newHTML, replacements = re.subn(pattern, repl, html) print(newHTML) doit(pattern, replacement) If I run it in a recent python I get the following error message which causes the ePubTidy program to hang: Code:
kbhend@KevinsiMac Desktop % python3 test.py Traceback (most recent call last): File "/Users/kbhend/Desktop/test.py", line 38, in <module> doit(pattern, replacement) File "/Users/kbhend/Desktop/test.py", line 35, in doit newHTML, replacements = re.subn(pattern, repl, html) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 196, in subn return _compile(pattern, flags).subn(repl, string, count) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 294, in _compile p = _compiler.compile(pattern, flags) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_compiler.py", line 743, in compile p = _parser.parse(p, flags) ^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 980, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 455, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 841, in _parse raise source.error('global flags not at the start ' re.error: global flags not at the start of the expression at position 5 So the problem is that something in the pattern is confusing the hell out of the re code. This could all be due to which recent versions of python3 is being used. I am using Python 3.11.3 (v3.11.3:f3909b8bc8, Apr 4 2023, 20:12:10) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin If someone knows regular exrressions really well, can you see what if anything is wrong or needs to be escaped in the troublesome pattern? Last edited by KevinH; 12-11-2024 at 01:10 PM. Reason: Add test.py as attachment |
![]() |
![]() |
![]() |
#259 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,883
Karma: 6120478
Join Date: Nov 2009
Device: many
|
Okay based on a google search for that re error:
https://stackoverflow.com/questions/...ession-at-posi It says we need to make the global flag at the beginning of the pattern and not later (ie make it global). So I would guess the following will fix this plugin to run properly with newer python versions: Change: Code:
CorrectText("Corrected apostrophes in wrong direction", r'[ ]?‘(?i)(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{\ 0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|e\ ll|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst| n|nd|neath|\ nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{\ 0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)', \ r' ’\1\2') Code:
CorrectText("Corrected apostrophes in wrong direction", r'(?i)[ ]‘(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{\ 0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|e\ ll|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst| n|nd|neath|\ nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{\ 0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)', \ r' ’\1\2') Note there is only one place in HTMLProcessor.py that will need this change. Would someone better at re regular expressions than me please confirm this change will not break this regular expressions functionality? Last edited by KevinH; 12-11-2024 at 12:54 PM. |
![]() |
![]() |
![]() |
#260 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,883
Karma: 6120478
Join Date: Nov 2009
Device: many
|
FWIW, with this change in HTMLProcessor.py in place, your test case now runs to completion.
Note this particular pattern is only run if certain conditions are met, so that explains why it works for some epubs and not for others under the latest python versions. |
![]() |
![]() |
![]() |
#261 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,682
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Seeing as how (?i) IS a global flag, I'm fairly certain it should be the very first thing. I've not done any real testing, but I'm guessing the plugin has simply been lucky in that it ran OK with earlier versions of Sigil's bundled Python (when this particular scenario was in play for the epub).
EDIT: It looks like this can be fixed as simply as by using the bundled regex module instead of the built-in re. Changing: import re to: import regex as re Makes your test case from above work regardless of where the flags are placed. EDIT 2: It's my opinion that the Barnett regex module continues to be more robust than the built-in re module. That's why I included it in the bundled Sigil python from the beginning and tried to encourage its use in plugins. Last edited by DiapDealer; 12-11-2024 at 01:17 PM. |
![]() |
![]() |
![]() |
#262 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,883
Karma: 6120478
Join Date: Nov 2009
Device: many
|
Sounds good. Either way it is now up to the author of this plugin to make the change.
Or users of this plugin can make the change themselves to their own local version until an official fix comes out. |
![]() |
![]() |
![]() |
#263 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,682
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I would think the change to move the global flags to the front of the expression is something that should probably happen anyway. That way it will work with whatever flavor of python regex is used. I'll leave it up to the dev to make whatever changes they deem necessary, but in the meantime, changing "import re" to "import regex as re" at the top of HTMLProcessor.py might be the quickest/easiest way for users to get back up and running in the meantime.
I found one of my own epubs where the plugin was hanging and changing that one line allowed the plugin to run to completion. It even corrected 7 instances of apostrophes being in the "wrong" direction. Last edited by DiapDealer; 12-11-2024 at 01:41 PM. |
![]() |
![]() |
![]() |
#264 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,883
Karma: 6120478
Join Date: Nov 2009
Device: many
|
But according to the python module regex (?i) is a scoped flag not a global flag. Which is why the regex module works with no pattern change required.
It is only the internal python module re that seems to think (?i) is a global flag. I am not sure which of the two is even correct. But the more I think about it ... you should be able to turn on ignore case, and back to fullcase inside any pattern so I think regex is correct here and the python module re is incorrect. ignore case should be a scoped flag. So this is either a bug in the internal python module re or a limitation that is not documented. Last edited by KevinH; 12-11-2024 at 01:55 PM. |
![]() |
![]() |
![]() |
#265 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,682
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
If it's supposed to be scoped, I would think the Barnett regex module is technically correct in allowing it to be used in a non-global manner. But I'm no expert there. I tend to use what works rather than what's "correct."
![]() |
![]() |
![]() |
![]() |
#266 |
Enthusiast
![]() Posts: 43
Karma: 10
Join Date: Oct 2008
Device: sony
|
I'm not going to even pretend to understand any of the last ten or eleven posts, but do they explain why the plugin doesn't work under Ubuntu?
I get the following error Status: failed Traceback (most recent call last): File "/app/share/sigil/plugin_launchers/python/launcher.py", line 141, in launch target_script = __import__(script_module) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dearleuk/.var/app/com.sigil_ebook.Sigil/data/sigil-ebook/sigil/plugins/ePubTidyTool/plugin.py", line 12, in <module> import tkinter.ttk as ttk #Essential for ttk. commands ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/tkinter/__init__.py", line 36, in <module> import _tkinter # If this fails your Python may not be configured for Tk ^^^^^^^^^^^^^^^ ImportError: /app/lib/python3.11/site-packages/_tkinter.cpython-311-x86_64-linux-gnu.so: undefined symbol: _PyObject_CallNoArg Error: /app/lib/python3.11/site-packages/_tkinter.cpython-311-x86_64-linux-gnu.so: undefined symbol: _PyObject_CallNoArg |
![]() |
![]() |
![]() |
#267 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,682
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
No. There's something wrong with your tkinter module installation. To be be perfectly honest, I've never seen paths like that for Sigil or for Python on Ubuntu before. I'm assuming Flatpak or Snap is involved somehow? Many plugins tend to break when Snap or Flatpak are involved.
|
![]() |
![]() |
![]() |
#268 |
Enthusiast
![]() Posts: 43
Karma: 10
Join Date: Oct 2008
Device: sony
|
Thank you. Yes it was Flatpak. I'll try the distro version
|
![]() |
![]() |
![]() |
#269 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,682
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Some people have had successful workarounds for plugins running in a Flatpak Sigil, but I'm afraid Flatpack isn't my cup of tea. Others are free to chime in.
|
![]() |
![]() |
![]() |
#270 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,883
Karma: 6120478
Join Date: Nov 2009
Device: many
|
Remember as DiapDealer wrote ...
Inside the plugin in the file HTMLProcessor.py you need to make this change: Quote:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tidying Up My Kindle | selectortone | Calibre | 2 | 07-17-2013 10:35 AM |
developping a Plugin for Presentation files | abdlink | Plugins | 4 | 04-15-2013 11:27 AM |
Plugin to fix fb2 files | oviksna | Plugins | 3 | 01-28-2013 08:53 AM |
Tidying Up My Library | JayLaFunk | Library Management | 2 | 09-20-2011 09:12 AM |
Calibre 0.7.50 can't see plugin files | mb_webguy | Calibre | 5 | 04-29-2011 03:41 AM |