Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 09-26-2012, 03:02 PM   #1
bobcdy
Fanatic
bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.
 
bobcdy's Avatar
 
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
proofing procedures

I've been looking for the best way to proof an ocr document, and it's been recommended to eventually save the doc as a text file and run it through the Project Gutenberg procedures - gutcheck, etc. This is supposed to eliminate most of the very common errors in ocr, such as substituting 'hut' for 'but', 'be for 'he', etc.

I haven't been doing this, but wondered if it's worthwhile to use gutcheck. Do most of you working on ebooks use this procedure, or do you use very careful reading of document for proofing?
bobcdy is offline   Reply With Quote
Old 09-27-2012, 08:22 AM   #2
mncowboy
Wanderer
mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.
 
mncowboy's Avatar
 
Posts: 106
Karma: 472218
Join Date: Jan 2011
Device: Kindle 3, PaperWhite 2
Nothing beats the human eye! Regardless of the solution you use, proofreading needs to be the final step.
mncowboy is offline   Reply With Quote
Advert
Old 09-27-2012, 09:57 AM   #3
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
There's only one way to properly proof-read a book, and that's to read the original text and the eBook in parallel, and compare them line by line, word by word, comma by comma. Anything else is not worthy of the name "proof-reading".
HarryT is offline   Reply With Quote
Old 09-27-2012, 12:53 PM   #4
bobcdy
Fanatic
bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.
 
bobcdy's Avatar
 
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
mncowboy & HarryT,
Thanks for the input. I've been using HarryT's suggestion of comparing the epub with the original document, epub on my 6" ereader and image pdf files on my 10" android, for proofing, and this works well although tedious and time-consuming. I've worried a bit, though, about overlooking mispellings of really common words such as 'be'.

Yesterday I tried using gutcheck (wingui) but found it didn't work correctly (wingui worked but the gutcheck plugin did not) so to continue with it I'd probably need to spend a lot of time finding the problem and it's solution. I guess I'll just continue without gutcheck because I'd still need careful proof-reading.
bobcdy is offline   Reply With Quote
Old 09-27-2012, 03:09 PM   #5
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,720
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by bobcdy View Post
Yesterday I tried using gutcheck (wingui) but found it didn't work correctly (wingui worked but the gutcheck plugin did not) so to continue with it I'd probably need to spend a lot of time finding the problem and it's solution. I guess I'll just continue without gutcheck because I'd still need careful proof-reading.
If you plan to release your book as an .epub, there's no point in converting it to a text file for spellchecking; you can spell-check it in Sigil, which comes with English, French and Spanish spellcheck dictionaries.
There's also a Modify ePub Calibre plug-in that'll convert straight quotes to curly quotes.
Doitsu is offline   Reply With Quote
Advert
Old 09-28-2012, 10:34 AM   #6
mncowboy
Wanderer
mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.mncowboy ought to be getting tired of karma fortunes by now.
 
mncowboy's Avatar
 
Posts: 106
Karma: 472218
Join Date: Jan 2011
Device: Kindle 3, PaperWhite 2
Proofing is the least fun part of the whole process. The human mind is wonderful at overlooking mistakes in order to grasp the whole concept. I've found the easiest way for to catch minute mistakes is to read the book backwords, sentence by sentence. Errors seem to jump out easier that way.
Bob
mncowboy is offline   Reply With Quote
Old 09-28-2012, 10:44 AM   #7
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Catching things like "he" instead of "be" isn't too hard. The really nasty ones are where letters have been joined up by the OCR process, so you get things like "dock" instead of "clock", "comer" instead of "corner" (that's a REALLY nasty one to spot), and so on.

I find that it helps to proof with a much larger text size than you'd normally read with, to make these errors stand out.
HarryT is offline   Reply With Quote
Old 09-28-2012, 12:44 PM   #8
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
Quote:
Originally Posted by HarryT View Post
Catching things like "he" instead of "be" isn't too hard. The really nasty ones are where letters have been joined up by the OCR process, so you get things like "dock" instead of "clock", "comer" instead of "corner" (that's a REALLY nasty one to spot), and so on.
My favorite was "modem methods of birth control."

Cigarette bums are also common.
Elfwreck is offline   Reply With Quote
Old 09-28-2012, 01:19 PM   #9
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Yes, rn -> m is a common one, and tough to see.
HarryT is offline   Reply With Quote
Old 09-29-2012, 12:26 AM   #10
bobcdy
Fanatic
bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.
 
bobcdy's Avatar
 
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
HarryT - thanks for the suggestion to use larger print size for proofing. I'll try that next time

Doitsu - gutcheck is not really a spell checker, it is more like a context checker. For example a spell checker wouldn't catch 'he' for 'be', but probably gutcheck would find many these out-of-context problems (although perhaps not for 'dock' for 'clock'; also gutcheck looks for extra spaces and many other common errors that spell checkers would ignore. 'Aspell', a spell checker, is a plugin for gutcheck.

Thanks for the info on the calibre plugin for conversion of straight to curly quotes - I used curly quotes for my last upload, and although regex really helped I still had some problems and I don't know if I changed all the initally straights to curlies.
bobcdy is offline   Reply With Quote
Old 09-29-2012, 02:31 PM   #11
graycyn
Wizard
graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.
 
Posts: 1,585
Karma: 11380098
Join Date: Aug 2010
Location: NE Oregon
Device: Kobo Sage, Pocketbook Era, Kobo Forma, Kindle Oasis 2
HarryT is right about larger fonts helping, I like to do proofreading on my Sony 350 because of the extra crisp higher resolution text. Makes it easy to catch the periods where commas should be and vice versa.

Also, using large fonts on that smaller 5" screen really sort of isolates the text more, making it easier to catch any other errors I might've missed doing the OCR to scan comparison on the computer.

My other trick is to never go right in to proofing a book on my reader straight after doing proofing on that same book on the computer. It's just too much of the same book at once, making it more likely my eyes will gloss over something I should catch.
graycyn is offline   Reply With Quote
Old 09-29-2012, 06:56 PM   #12
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
Another trick for proofing: start at the end and move backwards. Less chance of getting caught up in the story that way. The typos and OCR errors are a lot easier to notice when they're taken out of context.
Elfwreck is offline   Reply With Quote
Old 09-30-2012, 04:01 AM   #13
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Besides using a large font, using a different (ugly but effective) font may help. In particular, DPCustomMono2 is designed to make it easier to spot those pesky scanning mistakes.
Jellby is online now   Reply With Quote
Old 09-30-2012, 02:21 PM   #14
graycyn
Wizard
graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.graycyn ought to be getting tired of karma fortunes by now.
 
Posts: 1,585
Karma: 11380098
Join Date: Aug 2010
Location: NE Oregon
Device: Kobo Sage, Pocketbook Era, Kobo Forma, Kindle Oasis 2
Quote:
Originally Posted by Elfwreck View Post
Another trick for proofing: start at the end and move backwards. Less chance of getting caught up in the story that way. The typos and OCR errors are a lot easier to notice when they're taken out of context.
Yes, I like to mix up chapters to avoid getting caught in the story. So I might proof chapter 15, then go to chapter 2, and so on at random.
graycyn is offline   Reply With Quote
Old 10-02-2012, 11:07 PM   #15
GrannyGrump
Obsessively Dedicated...
GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.
 
GrannyGrump's Avatar
 
Posts: 3,213
Karma: 34984682
Join Date: May 2011
Location: JAPAN (US expatriate)
Device: Sony PRS-T2, ADE on PC
I was going to suggest using a mono-space font, but I see Jellby already did that.

I use Monaco, which uses a slashed zero, and wonderfully differentiates between upper-case I, lower-case l, and number 1. It falls down on em-dash though, they are very slightly longer than the hyphen/en-dash and hard to tell apart, so before proofing I temporarily replace all the em-dashes with a unique character, such as @ or #.
GrannyGrump is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Proofing Term Papers Mjaydakid Amazon Kindle 3 05-06-2011 11:17 AM
DRM and future proofing in the UK spinningdoc Which one should I buy? 11 05-03-2011 06:34 PM
Future-proofing my username... bornagainpenguin Introduce Yourself 6 01-18-2010 02:26 AM
Future proofing and LRF... Student1 Calibre 5 03-26-2009 08:11 AM
Where Are Procedures For Hard Reset? Vienna01 Sony Reader 1 04-17-2007 03:24 PM


All times are GMT -4. The time now is 01:56 PM.


MobileRead.com is a privately owned, operated and funded community.