Quote:
Originally Posted by mzmm
just thought i'd throw in that you don't need to escape most metacharacters inside a character class, so you could rewrite
[...]
|
Thanks a lot for the info.
That Regex was just one of the things I created WAYYYY back when I first started figuring out Regex, and since it continued to work so well, I just didn't mess with it. And better to be safe with escapes than sorry!
I actually stumbled across a few cases in the past few days of left and right brackets '[' ']', might have to be added in to Regex #1 and Regex #2.
ALSO, there is the odd case I forgot to mention of the wrong punctuation being italicized (QUITE common OCR error). For example,
Quote:
<p>Stigler, George. 1961. “The Economics of Information.<span class="italics">” Journal of Political Economy</span> 69.</p>
|
As you can see here, the RIGHT double quote is included in the italics, but isn't in my Regex #1.
I typically tackle these on a case-by-case basis at a later date (sometimes I can spot other errors when this occurs). For example, quite often a quotation mark can be the wrong way around, OR, the "smart punctuation" algorithm went haywire, and an anomaly occurred.