Edit: This is an answer to Jon's question...not trying to repeat what Tex already answered above...
Quote:
Originally Posted by JSWolf
But what if the dialog is something like «Hello» she said.
You would end up with «Hello she said.»
|
Sorry It took so long to reply. I was off being a kid in Orlando...
To answer your question, Jon, No. In general, the ? in the capture group forces it to be a minimal match so it would stop capturing at the first legal stopping location. Your example has a closing » so it would not trigger a match.
However, there are two points that I would change based on the discussion above (and my chance to test now that I'm home - which I couldn't do on my phone).
- Instead of using the negative lookahead on the » I would use a positive lookahead on the </p>
- I would add the » to a 'not' in the capture group
That would capture anything preceded by a « that doesn't have a » followed by a </p>, without capturing the </p>(eg. just the text).
You are correct that the «\1» would place guillemets around the entire captured text, but this process was never intended to be completely automatic. There is no way for the Find/Replace to discern what is dialogue and what is not. If you use the "Replacements (Delete Unwanted Replacements)" table (see attached image) you would quickly be able to see which sections have an opening « with no closing » highlighted - with what it would look like After the replace. As KevinH mentioned above, you could quickly remove any lines that didn't match the change you wanted...apply changes...then run the find again with a different replace criteria.
ie. the above would not find "<p>«Hello she said. «Hello she said.»</p>" until you ran a different Find with the « in place of the look ahead </p> Like: «(.[^»</p>]*?)«
That would probably be the fastest (least # of iterations) way to search through the text to find those types of errors.