You sure did mention that. But I wasn't paying enough attention...
I was thinking about starting a regex thread -- but you should do it. I may have a couple to contribute, after seeing yours. For instance:
which I replace with:
because
it's an apostrophe, not a quote mark in words like:
I'm, we'll, could've...
Is
' well-supported now?
I'm of the mind that you should quote and apostrophe, etc. with either the entity-name tags (in HTML) or with the ascii/unicode character (in text) and not mix them up. But I cannot find that in real life much. Since a blanket search replace of
' with
’ does visually improve a text, I understand why it happens.
As for the
'{space}" layout you mention, I do try to change things to that -- but the texts I find are not always so neat. Therefore, I have to do it the hard way sometimes.
Your regex was a little difficult to use on one text I did this afternoon: it used
’.” ’?” ’!” at the end of sentences and both
‘ ’ and
“ ” unicode characters. A simple search/replace on individual characters probably would have been smarter -- and in fact I had to do that at the end. Then switch the rsquo and the punctuation. I've been rushing a bit to complete a goal, so I'm not taking enough time to figure it out ahead. (A set of 56 short stories [some are novellas] by a single author.)
But it might not be the regex -- I'm running Win2k in a virtual machine to support NoteTab, and who knows what
that can lead to. At one point I was getting only part of what I would copy to the clipboard. (Restarted, of course.)
It's funny -- someone went to a lot of trouble to use curly quotes in this text, but did no work on mdashes vs. hyphens, ellipses, or to clean up blockquotes, or even spell-check thoroughly. Quite a haphazard use of italics, too. Weird.
m a r