View Single Post
Old 09-07-2018, 06:59 PM   #226
szarroug3
Zealot
szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'
 
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
Quote:
Originally Posted by Shark69 View Post
As an alternative and talking about the code before the refactoring (because I know it better), I'd like to suggest you processing the text with four regex:

For aliases inside the paragraph:
word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'[^a-zA-Z0-9_]|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I)

For aliases at the beginning of paragraph:
word_pat = re.compile(r'(?=(^' + r'[^a-zA-Z0-9_]|^'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I)

For aliases at the end of paragraph:
word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'$|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'$))', re.I)

and then for aliases found just as a paragraph:
word_pat = re.compile(r'(?=(^' + r'$|^'.join(escaped_word_list) + r'$))', re.I)

I've checked it with success.
I don't really like the code as it was before refactoring. It was too clunky and made too many mistakes looking for the beginning and end of the word because of the whole having to encode/decode thing. Regex is doing a much better job.

Your multiple regex idea has given me an idea though so let me try a few things before going back to the old code.

Thanks!
szarroug3 is offline   Reply With Quote