I do proofing broadly similar to norway1456. My workflow is roughly as follows:
- Download multiple versions of a book from the Internet Archive
AND/OR
- Do two separate scans, 150 and 300 dpi is what I use.
- Use vimdiff for spotting differences and merging
- Put scan images and revised text side by side in an HTML file, import into LibreOffice, run spellcheck, and proofread, with particular attention to paragraphs, italics, and punctuation.
- Finally, add HTML code and run text through home-brewed scripts to create XHTML file and epub-file.
I use Adobe Acrobat X Pro; I haven't tried any others, but it seems to do a decent job.
vimdiff isn't exactly user friendly, but when you've learnt the key combinations, it's darn fast, and carpal tunnel friendly.
I try to eliminate trivial differences between the scanned texts before diffing, in particular different lengths in initial spaces. The following regexps handle this:
Code:
1,$s/^ *\([a-z]\)/\1/
1,$s/^ *\([A-Z"']\)/\t\1/
1,$s/^ \([^ ]\)/\1/