View Single Post
Old 08-06-2014, 01:58 PM   #33
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by arspr View Post
See my previous post (now edited). I can fully replicate the issue with your sample _before.epub
I'll be damned. I can replicate it with your exception file too! But in all fairness... that file is one weird puppy. It starts off with a UTF8 byte order mark (which is being included with the first 'bout' entry); each line is terminated with a CR/LF and then there's an additional LF character in between every entry. What did you use to create it?

I'll see if I can't come up with something to scrub the file of UTF8 BOM and additional LF characters before processing.

Quote:
OTOH How do you manage to get ' correctly modified before decades in number? Because any automated decision you make can be really risky.
I don't DO anything. SmartyPants looks for a single-straight quote immediately followed by two digits that are immediately followed by the lower-case letter 's' ... and changes that straight-single quote to a curly right-single quote. There are a few other details that take care of unique situations, but that's the gist of it. I don't really see the "risk" in that. Instead of worrying about it, why not offer up a situation where my tool gets 'XXs wrong? Or calibre's smartener for that matter. They're no different in that regard. Just straight-up SmartyPants. Besides ... the decades thing is easy enough to double-check with a regex search.

I'm not trying to offer up any new, infallible quotation-smartening logic here. All the caveats for algorithmic quotation-smartening still apply. I'm just looking to add more control to WHAT you want to smarten, and to lessen the number of 'tis 'bout and 'cept'n foul-ups, and to do so without affecting any code in the document that doesn't pertain to punctuation being smartened.
DiapDealer is offline