|  12-11-2024, 11:15 AM | #256 | |
| Enthusiast  Posts: 43 Karma: 10 Join Date: Oct 2008 Device: sony | Quote: 
 Last edited by dearleuk; 12-11-2024 at 12:07 PM. | |
|   |   | 
|  12-11-2024, 11:40 AM | #257 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Hi All, Okay, I tracked down the hang to the HTMLProcessor processHTML method that tries to "fix apostrophes in the wrong direction". In this case it does this by running the following command: Code: 
		CorrectText("Corrected apostrophes in wrong direction", r'[ ]?‘(?i)(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{\
0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|e\
ll|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst| n|nd|neath|\
nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{\
0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)', \
r' ’\1\2')I think we really need to get the author of this plugin involved to track this one down. In case anyone is interested, CorrectText is basically doing the following: newHtml, replacements = re.subn(pattern, replacement, html) but this is actually hanging so my guess is unbalanced single quotes of some sort but that is a whopper of a regular expression. Last edited by KevinH; 12-11-2024 at 11:48 AM. | 
|   |   | 
|  12-11-2024, 12:36 PM | #258 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Okay, I have reduced the hang down to the following simpler test case: Code: import sys
import re
global html
pattern = r'[ ]?‘(?i)(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|ell|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst|n|nd|neath|nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)'
replacement =  r' ’\1\2'
# pattern = r" (’|')(re|ve|t|m|d|s|ll) "
# replacement=r"\1\2 "
html='''
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta name="charset" content="UTF-8"/>
<meta name="viewport" content="width=-3, height=-4"/>
<title></title>
  <link href="../Styles/main.css" type="text/css" rel="stylesheet"/>
</head>
<body class="fullpage">
   <div class="cover">
      <img id="coverimage" src="../Images/mycoverimagehere.jpg" alt="cover image"/>
   </div>
</body>
</html>
'''
def doit(pat, repl):
    count = 0
    flags = 0
    newHTML, replacements = re.subn(pattern, repl, html)
    print(newHTML)
doit(pattern, replacement)If I run it in a recent python I get the following error message which causes the ePubTidy program to hang: Code: kbhend@KevinsiMac Desktop % python3 test.py
Traceback (most recent call last):
  File "/Users/kbhend/Desktop/test.py", line 38, in <module>
    doit(pattern, replacement)
  File "/Users/kbhend/Desktop/test.py", line 35, in doit
    newHTML, replacements = re.subn(pattern, repl, html)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 196, in subn
    return _compile(pattern, flags).subn(repl, string, count)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 294, in _compile
    p = _compiler.compile(pattern, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_compiler.py", line 743, in compile
    p = _parser.parse(p, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 980, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 455, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 841, in _parse
    raise source.error('global flags not at the start '
re.error: global flags not at the start of the expression at position 5So the problem is that something in the pattern is confusing the hell out of the re code. This could all be due to which recent versions of python3 is being used. I am using Python 3.11.3 (v3.11.3:f3909b8bc8, Apr 4 2023, 20:12:10) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin If someone knows regular exrressions really well, can you see what if anything is wrong or needs to be escaped in the troublesome pattern? Last edited by KevinH; 12-11-2024 at 01:10 PM. Reason: Add test.py as attachment | 
|   |   | 
|  12-11-2024, 12:50 PM | #259 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Okay based on a google search for that re error: https://stackoverflow.com/questions/...ession-at-posi It says we need to make the global flag at the beginning of the pattern and not later (ie make it global). So I would guess the following will fix this plugin to run properly with newer python versions: Change: Code: 		CorrectText("Corrected apostrophes in wrong direction", r'[ ]?‘(?i)(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{\
0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|e\
ll|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst| n|nd|neath|\
nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{\
0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)', \
r' ’\1\2')Code: 		CorrectText("Corrected apostrophes in wrong direction", r'(?i)[ ]‘(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{\
0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|e\
ll|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst| n|nd|neath|\
nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{\
0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)', \
r' ’\1\2')Note there is only one place in HTMLProcessor.py that will need this change. Would someone better at re regular expressions than me please confirm this change will not break this regular expressions functionality? Last edited by KevinH; 12-11-2024 at 12:54 PM. | 
|   |   | 
|  12-11-2024, 12:59 PM | #260 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			FWIW, with this change in HTMLProcessor.py in place, your test case now runs to completion. Note this particular pattern is only run if certain conditions are met, so that explains why it works for some epubs and not for others under the latest python versions. | 
|   |   | 
|  12-11-2024, 01:06 PM | #261 | 
| Grand Sorcerer            Posts: 28,855 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Seeing as how (?i) IS a global flag, I'm fairly certain it should be the very first thing. I've not done any real testing, but I'm guessing the plugin has simply been lucky in that it ran OK with earlier versions of Sigil's bundled Python (when this particular scenario was in play for the epub). EDIT: It looks like this can be fixed as simply as by using the bundled regex module instead of the built-in re. Changing: import re to: import regex as re Makes your test case from above work regardless of where the flags are placed. EDIT 2: It's my opinion that the Barnett regex module continues to be more robust than the built-in re module. That's why I included it in the bundled Sigil python from the beginning and tried to encourage its use in plugins. Last edited by DiapDealer; 12-11-2024 at 01:17 PM. | 
|   |   | 
|  12-11-2024, 01:13 PM | #262 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Sounds good.  Either way it is now up to the author of this plugin to make the change. Or users of this plugin can make the change themselves to their own local version until an official fix comes out. | 
|   |   | 
|  12-11-2024, 01:33 PM | #263 | 
| Grand Sorcerer            Posts: 28,855 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			I would think the change to move the global flags to the front of the expression is something that should probably happen anyway.  That way it will work with whatever flavor of python regex is used. I'll leave it up to the dev to make whatever changes they deem necessary, but in the meantime, changing "import re" to "import regex as re" at the top of HTMLProcessor.py might be the quickest/easiest way for users to get back up and running in the meantime. I found one of my own epubs where the plugin was hanging and changing that one line allowed the plugin to run to completion. It even corrected 7 instances of apostrophes being in the "wrong" direction. Last edited by DiapDealer; 12-11-2024 at 01:41 PM. | 
|   |   | 
|  12-11-2024, 01:47 PM | #264 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			But according to the python module regex (?i) is a scoped flag not a global flag.  Which is why the regex module works with no pattern change required. It is only the internal python module re that seems to think (?i) is a global flag. I am not sure which of the two is even correct. But the more I think about it ... you should be able to turn on ignore case, and back to fullcase inside any pattern so I think regex is correct here and the python module re is incorrect. ignore case should be a scoped flag. So this is either a bug in the internal python module re or a limitation that is not documented. Last edited by KevinH; 12-11-2024 at 01:55 PM. | 
|   |   | 
|  12-11-2024, 01:51 PM | #265 | 
| Grand Sorcerer            Posts: 28,855 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			If it's supposed to be scoped, I would think the Barnett regex module is technically correct in allowing it to be used in a non-global manner. But I'm no expert there. I tend to use what works rather than what's "correct."    | 
|   |   | 
|  12-12-2024, 03:18 AM | #266 | 
| Enthusiast  Posts: 43 Karma: 10 Join Date: Oct 2008 Device: sony | 
			
			I'm not going to even pretend to understand any of the last ten or eleven posts, but do they explain why the plugin doesn't work under Ubuntu? I get the following error Status: failed Traceback (most recent call last): File "/app/share/sigil/plugin_launchers/python/launcher.py", line 141, in launch target_script = __import__(script_module) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dearleuk/.var/app/com.sigil_ebook.Sigil/data/sigil-ebook/sigil/plugins/ePubTidyTool/plugin.py", line 12, in <module> import tkinter.ttk as ttk #Essential for ttk. commands ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/tkinter/__init__.py", line 36, in <module> import _tkinter # If this fails your Python may not be configured for Tk ^^^^^^^^^^^^^^^ ImportError: /app/lib/python3.11/site-packages/_tkinter.cpython-311-x86_64-linux-gnu.so: undefined symbol: _PyObject_CallNoArg Error: /app/lib/python3.11/site-packages/_tkinter.cpython-311-x86_64-linux-gnu.so: undefined symbol: _PyObject_CallNoArg | 
|   |   | 
|  12-12-2024, 06:13 AM | #267 | 
| Grand Sorcerer            Posts: 28,855 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			No. There's something wrong with your tkinter module installation. To be be perfectly honest, I've never seen paths like that for Sigil or for Python on Ubuntu before. I'm assuming Flatpak or Snap is involved somehow? Many plugins tend to break when Snap or Flatpak are involved.
		 | 
|   |   | 
|  12-12-2024, 06:34 AM | #268 | 
| Enthusiast  Posts: 43 Karma: 10 Join Date: Oct 2008 Device: sony | 
			
			Thank you. Yes it was Flatpak. I'll try the distro version
		 | 
|   |   | 
|  12-12-2024, 08:31 AM | #269 | 
| Grand Sorcerer            Posts: 28,855 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Some people have had successful workarounds for plugins running in a Flatpak Sigil, but I'm afraid Flatpack isn't my cup of tea. Others are free to chime in.
		 | 
|   |   | 
|  12-12-2024, 08:32 AM | #270 | |
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Remember as DiapDealer wrote ... Inside the plugin in the file HTMLProcessor.py you need to make this change: Quote: 
 | |
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Tidying Up My Kindle | selectortone | Calibre | 2 | 07-17-2013 10:35 AM | 
| developping a Plugin for Presentation files | abdlink | Plugins | 4 | 04-15-2013 11:27 AM | 
| Plugin to fix fb2 files | oviksna | Plugins | 3 | 01-28-2013 08:53 AM | 
| Tidying Up My Library | JayLaFunk | Library Management | 2 | 09-20-2011 09:12 AM | 
| Calibre 0.7.50 can't see plugin files | mb_webguy | Calibre | 5 | 04-29-2011 03:41 AM |