Okay, I have reduced the hang down to the following simpler test case:
Code:
import sys
import re
global html
pattern = r'[ ]?‘(?i)(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|ell|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst|n|nd|neath|nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)'
replacement = r' ’\1\2'
# pattern = r" (’|')(re|ve|t|m|d|s|ll) "
# replacement=r"\1\2 "
html='''
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta name="charset" content="UTF-8"/>
<meta name="viewport" content="width=-3, height=-4"/>
<title></title>
<link href="../Styles/main.css" type="text/css" rel="stylesheet"/>
</head>
<body class="fullpage">
<div class="cover">
<img id="coverimage" src="../Images/mycoverimagehere.jpg" alt="cover image"/>
</div>
</body>
</html>
'''
def doit(pat, repl):
count = 0
flags = 0
newHTML, replacements = re.subn(pattern, repl, html)
print(newHTML)
doit(pattern, replacement)
If I run it in a recent python I get the following error message which causes the ePubTidy program to hang:
Code:
kbhend@KevinsiMac Desktop % python3 test.py
Traceback (most recent call last):
File "/Users/kbhend/Desktop/test.py", line 38, in <module>
doit(pattern, replacement)
File "/Users/kbhend/Desktop/test.py", line 35, in doit
newHTML, replacements = re.subn(pattern, repl, html)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 196, in subn
return _compile(pattern, flags).subn(repl, string, count)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 294, in _compile
p = _compiler.compile(pattern, flags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_compiler.py", line 743, in compile
p = _parser.parse(p, flags)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 980, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 455, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 841, in _parse
raise source.error('global flags not at the start '
re.error: global flags not at the start of the expression at position 5
if I instead edit test.py and replace the pattern and replacement provided with something simpler, it all works.
So the problem is that something in the pattern is confusing the hell out of the re code.
This could all be due to which recent versions of python3 is being used.
I am using Python 3.11.3 (v3.11.3:f3909b8bc8, Apr 4 2023, 20:12:10) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
If someone knows regular exrressions really well, can you see what if anything is wrong or needs to be escaped in the troublesome pattern?