Quote:
Originally Posted by eschwartz
|
I second this. Or here is "Diap's Editing Toolbag" (Calibre version of the Plugin) which I use all the damn time:
https://www.mobileread.com/forums/sho....php?p=2980740
I find it to be much more helpful than Calibre's built in Smarten Punctuation because this one has options (such as don't touch ellipses, or don't touch dashes).
Side Note: My personal method is three rounds:
Round #1: Diapdealer's Toolbag, Smarten Punctuation.
Round #2: I run the book through a lot of the regex fixes (of common errors I have come across). I do my usual code cleanup + OCR fixing + everything else.
Round #3: As the final step, I run the final text through Toxaris's Dialogue Check.
Quote:
Originally Posted by HarryT
Tools tend to have problems with words with initial apostrophes, particularly in books where the single apostrophe is also used for speech marks.
|
Yep, the automated tools do make quite a few mistakes (typically around em dashes, italics/other HTML tags, [...]).
The books I work on don't really have too many of the "'tis a jolly good day" + "'twas the night before Christmas" + "go get 'em", but I have this in my Saved Regexes:
Search: ‘(Em|em|Til|til|Tis|tis|Twas|twas)
Replace: ’\1
You can easily just append whatever words needed in there with a pipe between, and it can make it easier to find/change the Left Single Quote (wrong) quotes into Right Single Quote (Correct).
(I believe Diap's Toolbag also has an "exception" list if you wanted to take that route.)
I also have this Regex to handle years, such as "’90s":
Search: ‘([0-9])
Replace: ’\1
Quote:
Originally Posted by JSWolf
What I would like is a way to convert UK style quotes to US style quotes because UK style quotes just look unnatural.
|

I mean come on JSWolf, you can't be serious. That is just because you primarily read US material.
I suspect you already came across this the multitude of times me (and Toxaris) have posted this:
https://en.wikipedia.org/wiki/Quotat...ious_languages
All different languages use all different types of quotation marks (High/Low, Left/Right, Quotes/Guillemets, [...]). If you read Finnish books you might be used to ”…” instead.
UK to US has no easily automated way to do it... you would have to manually replace all Left/Right Single Quotes with their Double Quote equivalents (and change all Double -> Single).
Then you try to catch a lot of the accidentally converted apostrophes like:
Search: ([a-zA-Z])”([a-z])
Replace: \1’\2
And step through and try to catch apostrophes at the end of words:
Search: ([s])”(\s)
Replace: \1’\2
And a ton more elbow grease.
I have done UK -> US quotes a handful of times, and it is brutal/boring work.
Quote:
Originally Posted by Jellby
It all boils down to distinguishing between right single quote and apostrophe. Unfortunately, in Unicode they are the same character (a design mistake, I'd say).
|
Hmmmm... yeah this does seem to be the crux of a lot of the Smarten Punctuation issues. It would simplify a lot. :P
Side Note: On a related note, does everyone here remember the glorious Smarten Punctuation (plus other typography) discussion we had back in 2014? (My gods, how time flies):
https://www.mobileread.com/forums/sho...58#post2912458