MobileRead Forums - View Single Post - Tools and methodology for easier proof-reading

SBT · 06-16-2012, 05:37 AM

I do proofing broadly similar to norway1456. My workflow is roughly as follows:

Download multiple versions of a book from the Internet Archive
AND/OR
Do two separate scans, 150 and 300 dpi is what I use.
Use vimdiff for spotting differences and merging
Put scan images and revised text side by side in an HTML file, import into LibreOffice, run spellcheck, and proofread, with particular attention to paragraphs, italics, and punctuation.
Finally, add HTML code and run text through home-brewed scripts to create XHTML file and epub-file.

I use Adobe Acrobat X Pro; I haven't tried any others, but it seems to do a decent job.
vimdiff isn't exactly user friendly, but when you've learnt the key combinations, it's darn fast, and carpal tunnel friendly.
I try to eliminate trivial differences between the scanned texts before diffing, in particular different lengths in initial spaces. The following regexps handle this:

Code:

1,$s/^ *\([a-z]\)/\1/
1,$s/^    *\([A-Z"']\)/\t\1/
1,$s/^ \([^ ]\)/\1/

06-16-2012, 05:37 AM	#6
SBT Fanatic Posts: 580 Karma: 810184 Join Date: Sep 2010 Location: Norway Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad	I do proofing broadly similar to norway1456. My workflow is roughly as follows: Download multiple versions of a book from the Internet Archive AND/OR Do two separate scans, 150 and 300 dpi is what I use. Use vimdiff for spotting differences and merging Put scan images and revised text side by side in an HTML file, import into LibreOffice, run spellcheck, and proofread, with particular attention to paragraphs, italics, and punctuation. Finally, add HTML code and run text through home-brewed scripts to create XHTML file and epub-file. I use Adobe Acrobat X Pro; I haven't tried any others, but it seems to do a decent job. vimdiff isn't exactly user friendly, but when you've learnt the key combinations, it's darn fast, and carpal tunnel friendly. I try to eliminate trivial differences between the scanned texts before diffing, in particular different lengths in initial spaces. The following regexps handle this: Code: 1,$s/^ \([a-z]\)/\1/ 1,$s/^ \([A-Z"']\)/\t\1/ 1,$s/^ \([^ ]\)/\1/