![]() |
#1 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
proofing procedures
I've been looking for the best way to proof an ocr document, and it's been recommended to eventually save the doc as a text file and run it through the Project Gutenberg procedures - gutcheck, etc. This is supposed to eliminate most of the very common errors in ocr, such as substituting 'hut' for 'but', 'be for 'he', etc.
I haven't been doing this, but wondered if it's worthwhile to use gutcheck. Do most of you working on ebooks use this procedure, or do you use very careful reading of document for proofing? |
![]() |
![]() |
![]() |
#2 |
Wanderer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 106
Karma: 472218
Join Date: Jan 2011
Device: Kindle 3, PaperWhite 2
|
Nothing beats the human eye! Regardless of the solution you use, proofreading needs to be the final step.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
There's only one way to properly proof-read a book, and that's to read the original text and the eBook in parallel, and compare them line by line, word by word, comma by comma. Anything else is not worthy of the name "proof-reading".
|
![]() |
![]() |
![]() |
#4 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
mncowboy & HarryT,
Thanks for the input. I've been using HarryT's suggestion of comparing the epub with the original document, epub on my 6" ereader and image pdf files on my 10" android, for proofing, and this works well although tedious and time-consuming. I've worried a bit, though, about overlooking mispellings of really common words such as 'be'. Yesterday I tried using gutcheck (wingui) but found it didn't work correctly (wingui worked but the gutcheck plugin did not) so to continue with it I'd probably need to spend a lot of time finding the problem and it's solution. I guess I'll just continue without gutcheck because I'd still need careful proof-reading. |
![]() |
![]() |
![]() |
#5 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,720
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
There's also a Modify ePub Calibre plug-in that'll convert straight quotes to curly quotes. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wanderer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 106
Karma: 472218
Join Date: Jan 2011
Device: Kindle 3, PaperWhite 2
|
Proofing is the least fun part of the whole process. The human mind is wonderful at overlooking mistakes in order to grasp the whole concept. I've found the easiest way for to catch minute mistakes is to read the book backwords, sentence by sentence. Errors seem to jump out easier that way.
Bob |
![]() |
![]() |
![]() |
#7 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Catching things like "he" instead of "be" isn't too hard. The really nasty ones are where letters have been joined up by the OCR process, so you get things like "dock" instead of "clock", "comer" instead of "corner" (that's a REALLY nasty one to spot), and so on.
I find that it helps to proof with a much larger text size than you'd normally read with, to make these errors stand out. |
![]() |
![]() |
![]() |
#8 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
Quote:
Cigarette bums are also common. |
|
![]() |
![]() |
![]() |
#9 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Yes, rn -> m is a common one, and tough to see.
|
![]() |
![]() |
![]() |
#10 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
HarryT - thanks for the suggestion to use larger print size for proofing. I'll try that next time
Doitsu - gutcheck is not really a spell checker, it is more like a context checker. For example a spell checker wouldn't catch 'he' for 'be', but probably gutcheck would find many these out-of-context problems (although perhaps not for 'dock' for 'clock'; also gutcheck looks for extra spaces and many other common errors that spell checkers would ignore. 'Aspell', a spell checker, is a plugin for gutcheck. Thanks for the info on the calibre plugin for conversion of straight to curly quotes - I used curly quotes for my last upload, and although regex really helped I still had some problems and I don't know if I changed all the initally straights to curlies. |
![]() |
![]() |
![]() |
#11 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,585
Karma: 11380098
Join Date: Aug 2010
Location: NE Oregon
Device: Kobo Sage, Pocketbook Era, Kobo Forma, Kindle Oasis 2
|
HarryT is right about larger fonts helping, I like to do proofreading on my Sony 350 because of the extra crisp higher resolution text. Makes it easy to catch the periods where commas should be and vice versa.
Also, using large fonts on that smaller 5" screen really sort of isolates the text more, making it easier to catch any other errors I might've missed doing the OCR to scan comparison on the computer. My other trick is to never go right in to proofing a book on my reader straight after doing proofing on that same book on the computer. It's just too much of the same book at once, making it more likely my eyes will gloss over something I should catch. |
![]() |
![]() |
![]() |
#12 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
Another trick for proofing: start at the end and move backwards. Less chance of getting caught up in the story that way. The typos and OCR errors are a lot easier to notice when they're taken out of context.
|
![]() |
![]() |
![]() |
#13 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Besides using a large font, using a different (ugly but effective) font may help. In particular, DPCustomMono2 is designed to make it easier to spot those pesky scanning mistakes.
|
![]() |
![]() |
![]() |
#14 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,585
Karma: 11380098
Join Date: Aug 2010
Location: NE Oregon
Device: Kobo Sage, Pocketbook Era, Kobo Forma, Kindle Oasis 2
|
Yes, I like to mix up chapters to avoid getting caught in the story. So I might proof chapter 15, then go to chapter 2, and so on at random.
|
![]() |
![]() |
![]() |
#15 |
Obsessively Dedicated...
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,213
Karma: 34984682
Join Date: May 2011
Location: JAPAN (US expatriate)
Device: Sony PRS-T2, ADE on PC
|
I was going to suggest using a mono-space font, but I see Jellby already did that.
I use Monaco, which uses a slashed zero, and wonderfully differentiates between upper-case I, lower-case l, and number 1. It falls down on em-dash though, they are very slightly longer than the hyphen/en-dash and hard to tell apart, so before proofing I temporarily replace all the em-dashes with a unique character, such as @ or #. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Proofing Term Papers | Mjaydakid | Amazon Kindle | 3 | 05-06-2011 11:17 AM |
DRM and future proofing in the UK | spinningdoc | Which one should I buy? | 11 | 05-03-2011 06:34 PM |
Future-proofing my username... | bornagainpenguin | Introduce Yourself | 6 | 01-18-2010 02:26 AM |
Future proofing and LRF... | Student1 | Calibre | 5 | 03-26-2009 08:11 AM |
Where Are Procedures For Hard Reset? | Vienna01 | Sony Reader | 1 | 04-17-2007 03:24 PM |