Quote:
Originally Posted by capnm
But I'm curious -- why the leading (?:^|\s+) instead of \s* is there a functional difference?
|
If you're not using the unicode support or don't have the locale flag set, you will end up with some non-whitespace characters(also punctuation you want to avoid) being seen as a break in a word; If you were to use \s*, this would then mean that the next letter - which has the possibility of being in the middle of a word, will be used as an initial.
By specifying that the starting point either has to be the start of a string (careful of multiline issues), this situation is removed as the word can only be separated by one or more spaces.
If you want to use it for replacement - as you wanted, the pattern would be :
Code:
find: (?iu)(?:^|\s+)((?:\d+\.?\d*?)|(?:[\D]))[\w]+
replace: \1
Tho it then uses the unicode flag, a trade off between being robust and easily matching things.