Quote:
Originally Posted by Shark69
As an alternative and talking about the code before the refactoring (because I know it better), I'd like to suggest you processing the text with four regex:
For aliases inside the paragraph:
word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'[^a-zA-Z0-9_]|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I)
For aliases at the beginning of paragraph:
word_pat = re.compile(r'(?=(^' + r'[^a-zA-Z0-9_]|^'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I)
For aliases at the end of paragraph:
word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'$|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'$))', re.I)
and then for aliases found just as a paragraph:
word_pat = re.compile(r'(?=(^' + r'$|^'.join(escaped_word_list) + r'$))', re.I)
I've checked it with success.
|
I don't really like the code as it was before refactoring. It was too clunky and made too many mistakes looking for the beginning and end of the word because of the whole having to encode/decode thing. Regex is doing a much better job.
Your multiple regex idea has given me an idea though so let me try a few things before going back to the old code.
Thanks!