As an alternative and talking about the code before the refactoring (because I know it better), I'd like to suggest you processing the text with four regex:
For aliases inside the paragraph:
word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'[^a-zA-Z0-9_]|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I)
For aliases at the beginning of paragraph:
word_pat = re.compile(r'(?=(^' + r'[^a-zA-Z0-9_]|^'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I)
For aliases at the end of paragraph:
word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'$|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'$))', re.I)
and then for aliases found just as a paragraph:
word_pat = re.compile(r'(?=(^' + r'$|^'.join(escaped_word_list) + r'$))', re.I)
I've checked it with success.
Last edited by Shark69; 09-07-2018 at 02:13 PM.
|