Quote:
Originally Posted by theducks
I think that should be (a back reference for each match)
\1\2
|
No. The second parenthesized expression (the positive lookahead) does not create a group that can be back referenced. Adding the \2 will generate an error, because there is only one group.
Quote:
Originally Posted by PeterT
Actually, I think the way it works is that the entire regex ONLY matches the first occurence of a single word character followed by a period and space.
|
It matches N occurrences of "letter dot space" -- an "initial". What it does not match is the last initial, preventing removing the space between that last initial and the following word.
Quote:
Since the (?=\w\.) is as starson says "(?= is a positive lookahead assertion. It lets the preceding match only when the following matches, but the lookahead part doesn't "eat up" any of the string." this means the only characters "consumed" by the reg ex. are the initial sequence "(\w\.) " and that is replaced by the (1) which is that initial \w\. sequence.
|
One thing to remember: matching and substitution in calibre's search/replace (and generally in regular expressions) is leftmost non-overlapping. This means that the expression will operate on the first string that matches, then start again at the left side of what remains. Because the lookahead assertion does not consume characters, what "remains" is the next initial, and the regexp process is run again on that initial and whatever follows it. This process repeats until the expression fails to match something, which will happen when there are no remaining initials followed by an initial.
Note that "leftmost-overlapping" does not imply either "adjacent" or "leading". It simply means that the input string is scanned from left to right. For example: regarding adjacent, there is no requirement that there be only one set of initials. Given the rather bizarre author name "A. B. Someword C. D. Lastname", the expression will match the A. and the C., resulting in "A.B. Someword C.D. Lastname". Regarding leading: the name "Joe A. B. Smith" will be changed to "Joe A.B. Smith".