View Single Post
Old 12-11-2024, 12:36 PM   #258
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,884
Karma: 6120478
Join Date: Nov 2009
Device: many
Okay, I have reduced the hang down to the following simpler test case:

Code:
import sys
import re

global html

pattern = r'[ ]?‘(?i)(\d\d|ad[n]{0,1}|app[yines]{0,5}|appen[eds]{0,2}|ard[er]{0,2}|arf|alf|ang|as|at|av[ein]{0,3}|bout|bye|cause|cept[ing]{0,3}|copter[s]{0,1}|cos|cross|cuz|couse|e[emr]{0,1}|ell|elp[edling]{0,5}|ere[abouts]{0,5}|eard|f|fraid|fore|id|igh[er]{0,2}|ighness|im|is|isself|gainst|kay|less|mongst|n|nd|neath|nough|nother|nuff|o[o]{0,1}|ood|ome|ow|op[eding]{0,3}|oney|orse[flesh]{0,5}|ouse[ds]{0,1}|pon|puter[edrs]{0,2}|round|scuse[ds]{0,1}|spect[sed]{0,2}|scaped|sides|tween|special[ly]{0,2}|stead|t|taint|til|tis|twas|twere|twould|twil l|ud|un|urt|vise)(\W?)'

replacement =  r' ’\1\2'

# pattern = r" (’|')(re|ve|t|m|d|s|ll) "
# replacement=r"\1\2 "

html='''
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta name="charset" content="UTF-8"/>
<meta name="viewport" content="width=-3, height=-4"/>
<title></title>
  <link href="../Styles/main.css" type="text/css" rel="stylesheet"/>
</head>
<body class="fullpage">
   <div class="cover">
      <img id="coverimage" src="../Images/mycoverimagehere.jpg" alt="cover image"/>
   </div>
</body>
</html>
'''

def doit(pat, repl):
    count = 0
    flags = 0
    newHTML, replacements = re.subn(pattern, repl, html)
    print(newHTML)

doit(pattern, replacement)

If I run it in a recent python I get the following error message which causes the ePubTidy program to hang:

Code:
kbhend@KevinsiMac Desktop % python3 test.py
Traceback (most recent call last):
  File "/Users/kbhend/Desktop/test.py", line 38, in <module>
    doit(pattern, replacement)
  File "/Users/kbhend/Desktop/test.py", line 35, in doit
    newHTML, replacements = re.subn(pattern, repl, html)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 196, in subn
    return _compile(pattern, flags).subn(repl, string, count)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 294, in _compile
    p = _compiler.compile(pattern, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_compiler.py", line 743, in compile
    p = _parser.parse(p, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 980, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 455, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/_parser.py", line 841, in _parse
    raise source.error('global flags not at the start '
re.error: global flags not at the start of the expression at position 5
if I instead edit test.py and replace the pattern and replacement provided with something simpler, it all works.

So the problem is that something in the pattern is confusing the hell out of the re code.

This could all be due to which recent versions of python3 is being used.

I am using Python 3.11.3 (v3.11.3:f3909b8bc8, Apr 4 2023, 20:12:10) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin

If someone knows regular exrressions really well, can you see what if anything is wrong or needs to be escaped in the troublesome pattern?
Attached Files
File Type: py test.py (1.4 KB, 98 views)

Last edited by KevinH; 12-11-2024 at 01:10 PM. Reason: Add test.py as attachment
KevinH is offline   Reply With Quote