Just to get it out of the way: I'm continuing this as an academic discussion, Kovid's solution to use \S+ seems to be the best way.
Before I answered, I tried the regex in plain Python. I found that using (?u) didn't work, and I couldn't properly test (?L), because I couldn't figure out how to set the locale within the five minutes or so I spent on the problem. That's why I was asking if Calibre sets that in its code.
|