View Single Post
Old 12-01-2021, 11:15 AM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by graycyn View Post
But I feel sort of bad that you typed all that out, as I've already gotten the curly quotes dealt with, finished that several days ago. But I picked up a zillion other small errors in the process.
Well, you'll be working on more books in the future.

Now something that took you days will take seconds, and the rest of the free time can be spent on hunting down typos or more important issues.

Same with the common patterns of errors. Once you notice one, regex can find them all.

Quote:
Originally Posted by Quoth View Post
Most wordprocessors do ’tis ’90 etc wrong.
That's another regex I use:

Find: ‘([0-9])
Replace: ’\1

That finds shortened years like:
  • In the ‘90s, ...
  • In the ‘70s, ...

and flips it to the correct RIGHT SINGLE QUOTE.

Quote:
Originally Posted by graycyn View Post
The Gutenberg text is much more of a trainwreck than I'd initially thought. I've found missing punctuation, including some of the quotes, but also em-dashes, hyphens, and some entire words! And a great deal of paragraph problems! Oh, and accented characters missing as well.
Looks like it was one of the very early conversions:

https://www.gutenberg.org/ebooks/3795

Now they're at book ~67k.

The quality of that stuff was not so good back then, but I'm still surprised such italics/typo errors snuck through.

The one frustration I have with Gutenberg books is they don't offer the original scan (PDF) they worked off of. This would allow you to go in there and re-correct based on the same source + bring it up to today's standards.

Modern PG books (done with Distributed Proofreaders) go through lots more rounds of proofing. If they redid this book now, it would definitely be much higher quality.

Last edited by Tex2002ans; 12-01-2021 at 01:56 PM.
Tex2002ans is offline   Reply With Quote