Okay, so I've figured out what's wrong but I can't figure out how to fix it. In the regex pattern I wrote, i use \b around the word I'm looking for. Turns out that this doesn't work when the first or last character in the word is non-ascii.
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
Basically, since the non-ascii character doesn't count as a "word" character, it doesn't fulfill any of these requirements.
I'm still working on it.