Originally Posted by Snowman
The Double-quote algorithm: On any one line, the quotes must be balanced in left-right pairs. This will break if the quotes span a line because of an intervening <br>, for example.
Or if a quoted text spans several paragraphs, in this case each paragraph starts with an opening quote mark but only the last one ends with a closing quote mark.
The Single-quote algorithm: This is difficult, because the single quote has multiple usage. Rather than try for balance, I look at the preceding character (ignoring html tags). If at the start-of-line, or the preceding character is a space, a left-double-quote, or an open-paren, then the output is a left-curly (‘ ), otherwise it is a right-curly (’ ).
This will break for the (rare) instances of a leading apostrophe (’ ) in cases such as 'ware for "beware" for example. And I'm sure that there are one or two other places it will go wrong.
Not so rare, depending on the text. There are books with lots of 'tis, 'twas, 'em, 'im, etc. It can also be a bit tricky when you have a preceding em-dash... And worse, there are books that use single quotes for top-level quote marks (mainly British, I think).
I think I'll continue using partially manual search and replace with vim. I also try to distinguish between closing single quotes (& rsquo;) and curly apostrophes (&# 8217;). They are both the same character (glyph), but using different codes in the source HTML allows me to easily exchange single and double quotes without affecting apostrophes, if needed.
I usually first replace every instance of ([letter]'[letter]) with the apostrophe, then search for (s') and put the apostrophe if needed, then search one by one for all (") or (') and replace it with opening or closing single or double quotes or apostrophe (each case is attached to one single key, so it's relatively quick and easy, and I can keep track of nesting levels or multi-paragraph quotes), as a bonus, I can also detect many cases of missing or wrong quote marks!