View Single Post
Old 11-30-2021, 10:46 AM   #7
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by graycyn View Post
Thanks for the suggestion! I did find one of theirs that is similar to the book I have, only 1906 instead. I could OCR the PDF, but I do my markup for curly quotes/italics by hand regardless, so kinda moot.
“Curly Quotes” (“Smart Quotes”) can be done at the push of a button.

Sigil

1. Install the "PunctuationSmarten" plugin.

2. In Sigil, you can then press Plugins > Edit > PunctuationSmarten.

That will open up a menu where you can convert all quotations to their smart versions.

Calibre

1. Install the "Diap's Editing Toolbag" plugin.

2. In Calibre's Editor, it's easier if you enable the "Smarten Punctuation" button in your toolbar. To do this, go into:

Edit > Preferences, then Toolbars.

In the "Toolbar to Customize" dropdown, choose: "Book wide tools from third party plugins".

2.5. You should see 2 columns:

Left-hand side = "Available actions" + Right-hand side = "Current actions".

On the left-hand column, find "Smarten Punctuation (the sequel)" + move it to the right using the middle arrows.

This will put a little "Einstein's face" button on your main Calibre Editor window.

You can press that button when working on a book, and it will smarten all the quotes.

Note: Calibre now has a built-in Tools > Smarten punctuation (works best for English), but I don't like it as much. Diap's tool lets you customize a lot more (like not messing with ellipses or dashes).

- - -

Side Note: These algorithms get 99% left/right quotes correct, but there are many edge cases it gets wrong. Especially around:
  • Em Dashes
  • Words like "Rock ’n’ Roll" + "Go Get ’Em Tiger" + "Wait ’til the clock strikes ten".
    • Correct usage is RIGHT single quotes... LEFT is wrong (a very common error, even in published books).
  • Complicated/nested HTML

I went into more detail on fixing quotes many times over the years. For example, see my in-depth posts from:

Quote:
Originally Posted by graycyn View Post
As for automating the smart quotes, I think that would still have a fair few errors. There's fairly heavy apostrophe use for missing letters in dialogue.

I'm halfway on curly quotes by hand, so I'll just continue. It's giving me opportunities to pick up other stuff as I go. The entire book puts spaces in contracted words for instance: could n't, would n't, sha n't, is n't, etc..
That can be adjusted using Regex.

Once you notice the pattern, you can do a mass search/replace to try to correct those in one fell swoop.

After Smarten Punctuation... this is one regular expression I use:

Find: ‘(Em|em|Til|til|Tis|tis|Twas|twas)
Replace: ’\1

which finds common words like ‘em, ‘tis, ‘twas and flips them to the correct apostrophe.

Boom... now that 1% of smarten errors turned into .1%.

Then I just search for all LEFT SINGLE QUOTES (usually there are < a few dozen), and manually correct any of those leftovers.

Side Note: If working on British books, with ‘single quotes’ being used for dialogue instead of “double quotes”... then things get quite a bit more complicated.

* * *

From there, I'd recommend using a Regex to search for a SPACE + apostrophe + SINGLE CHARACTER by itself.

To do this, use this regex:

Find: \s(’\w)\b
Replace: \1

This will catch things like:
  • There ’s an error here.
  • They ’d be okay.

and convert to:
  • There’s an error here.
  • They’d be okay.

Of course, that regex can be adjusted for more complicated patterns:

Find: \s(\w’\w)\b
Replace: \1
  • could n’t, would n’t, sha n’t, is n’t
  • couldn’t, wouldn’t, shan’t, isn’t



Quote:
Originally Posted by graycyn View Post
What I really, truly dread is running it through spell check. That's part of my process at the end of my proofreading, and usually finds a small handful of things I've missed, but it's gonna be hell with this text, because there are a lot of deliberately misspelled words in the children's dialog. So I'm glad to have a PDF for searching and checking.
Spellcheck Lists are your friend.

Sigil: Tools > Spellcheck > Spellcheck

Calibre: Tools > Check spelling.

This will let you mass check/correct/Ignore all the words in a book.

You can even use it for tricks, like listing all hyphenated words or catch common OCR errors like 'o' -> '0' or 'l' -> '1'.

Last edited by Tex2002ans; 11-30-2021 at 11:56 AM.
Tex2002ans is offline   Reply With Quote