View Single Post
Old 09-14-2009, 11:34 AM   #2
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quotation Mark Fixing

The way I've been fixing quotation marks is by parsing through the document, character by character, and keeping track of whether the current state of the document is quotation-opened or quotation-closed.

Doing so, however, led to fairly frequent errors due to (legitimately) unclosed quotation marks. As a result, I started overriding the decision of whether to put an opening quotation mark or a closed one based on which side of the quotation mark had alphanumeric characters (as opposed to whitespace or punctuation). This fixed most false positives.

In English, however, there is also the use of apostrophes in words. Therefore single quotation marks that have alphanumeric characters on both sides (e.g.: Steve's, it's, ain't) are considered apostrophes and not quotation marks. Also, any single quotation mark that follows an 's' is considered suspect of being an apostrophe (e.g.: Jesus' name, Boris' house)... suspicion being turned to certainty if the paragraph is yet to have an opening single quote and/or has no subsequent closing single quote or following-line opening single quote (as said line's first character).

The last bit of complication would be words like >> 'Tis <<. This is probably best handled by an exception list... which, while not exhaustive, should work reasonably well for the vast majority of documents. Or, alternatively, the user could be alerted about lone-ranger single quotation marks (as they do, in some PG documents, occur by error... or, rather, sometimes a second single quotation mark fails to occur by error but is discernible by context).

- Ahi
ahi is offline   Reply With Quote