Quote:
Originally Posted by Jellby
That's wrong
|
Nope, but I wasn't being very clear.
Input: <p>‘Twasn’t my fault</p>
Output: <p>‘Twasn’t my fault #[‘]</p>
Search for #. See the error marker. The bit after the # tells you there's an unclosed single-open-quote; you have to scan backwards to find it. Having found it, you have to figure out that it needs to be changed to a proper apostrophe.
It's a very crude implementation. I took the simple regular expressions I've been using, and rewrote them in python on top of an XML tokenizer, so it should work on quotes coded in several different ways, and even on gnarly html generated by MSWord. It's nothing you can't do with the regular expressions, but it seems quite error-prone to keep adapting the regular expressions to work on different files.
Quote:
It would be good if it could mark apostrophes and closing single quotes with different characters.
|
It's not possible to distinguish unambiguously between apostrophes and closing single quotes in all cases.
<p>Rock 'n' Roll</p>
I don't want to miss those cases - if I wasn't going to bother, I'd stick with straight quotes. I don't trust myself to program an exhaustive set of exceptions that still avoids accepting any errors. I don't trust myself to notice every single case just from reading the book, or I wouldn't need the script in the first place. (And having to squint at every single quote mark is not good for my eyesight).
So the script skips everything that's definitely _not_ an apostrophe (because it's not immediately after a word), and flags all the remaining apostrophe-like characters for review.
<p>Rock ‘n’* Roll *</p>
(The second * indicates that the paragraph contains exactly one unambiguous open-quote, so that exactly one of the starred apostrophes is playing the role of an close-quote. But that's wrong, which means there must be an error: the open-quote character needs to be changed to become an apostrophe).
The second * can also appear in the middle of a paragraph, if the open-single-quote is inside double quotes
<p>"Rock 'n' Roll", shouted.</p>
<p>“Rock ‘n’* Roll” *, shouted.</p>
so if there's more than one double-quoted part which contains ambiguous single quotes, you review them separately.
The other feature is it keeps a bunch of statistics, so you get an overview of the file without having to read it. (Useful if you want to know what sort of errors to look out for, particularly if you don't want to "spoil" yourself on the book before you read it for the first time).