Originally Posted by ectoplasm
First, it took much longer than what I first thought. This book in particular may have been an especially poor choice to attempt such a conversion. Dickens uses a lot of punctuation such as:
“So th’ are, so th’ are!” cried Ham. “Well said! So th’ are. Mas’r Davy bor—gent’lmen growed—so th’ are!”
Welcome to the amazing world of "Why regular expresions tend to fail when converting straight quotes to curly quotes"
May I suggest you use different markup for right single quote and apostrophe (both displayed as ’ )? That way, it would be fairly easy to change from US- to UK-style or vice-versa afterwards. In my files I use "’" for right single quotes and "& #8217;" (no space) for apostrophes, they encode the exact same character, however.