View Single Post
Old 11-22-2021, 02:30 PM   #5
graycyn
Wizard
graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.
 
Posts: 1,496
Karma: 11250344
Join Date: Aug 2010
Location: NE Oregon
Device: Kobo Sage, Forma, Kindle Oasis 2, Sony PRS-T2
Quote:
Originally Posted by Quoth View Post
Quotes are easily fixed by Calibre and other tools. Italics needs you to have fancy OCR from Archive Org scan, or DIY.

I thought I'd seen Gutenberg texts with Italics, though not checked that one. Usually I download mobi + images (if there are any) from Gutenberg and convert in Calibre to epub with automatic smart quotes, 1.4em 1st line in paragraph indent and remove spaces between paragraphs. Sometimes auto fully justify. Occasionally I edit the epub.
Usually the only worthwhile Archive Org is the sacn/PDF. The epub/mobi are "simple OCR", hence full of gibberish. I can do better than that with Tesseract on Linux and my 2002 scanner.

Usually, I think Gutenberg does do italics. Not *this* text though. There doesn't appear to be tons of italic use, just moderate, so I'll just go with DIY. As for automating the smart quotes, I think that would still have a fair few errors. There's fairly heavy apostrophe use for missing letters in dialogue.

I'm halfway on curly quotes by hand, so I'll just continue. It's giving me opportunities to pick up other stuff as I go. The entire book puts spaces in contracted words for instance: could n't, would n't, sha n't, is n't, etc... Gutenberg corrected a lot of that to modern use, but didn't get them all by any means, strays keep popping up. I'm also planning to go with modern use for the contractions, makes more sense for an ebook to have it easier for folks to read and not have to insert tons of non-breaking spaces. There's also stray capitalization here and there. And special characters, which Gutenberg also missed.

I'll get it, I'm a "noticing sort." Text should be very nice when done.

What I really, truly dread is running it through spell check. That's part of my process at the end of my proofreading, and usually finds a small handful of things I've missed, but it's gonna be hell with this text, because there are a lot of deliberately misspelled words in the children's dialog. So I'm glad to have a PDF for searching and checking.

Otherwise, I'm enjoying what I'm seeing of the book, so that's a plus.


Sent from my iPad using Tapatalk
graycyn is online now   Reply With Quote