View Single Post
Old 06-16-2012, 04:37 AM   #6
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
I do proofing broadly similar to norway1456. My workflow is roughly as follows:
  1. Download multiple versions of a book from the Internet Archive
    AND/OR
  2. Do two separate scans, 150 and 300 dpi is what I use.
  3. Use vimdiff for spotting differences and merging
  4. Put scan images and revised text side by side in an HTML file, import into LibreOffice, run spellcheck, and proofread, with particular attention to paragraphs, italics, and punctuation.
  5. Finally, add HTML code and run text through home-brewed scripts to create XHTML file and epub-file.
I use Adobe Acrobat X Pro; I haven't tried any others, but it seems to do a decent job.
vimdiff isn't exactly user friendly, but when you've learnt the key combinations, it's darn fast, and carpal tunnel friendly.
I try to eliminate trivial differences between the scanned texts before diffing, in particular different lengths in initial spaces. The following regexps handle this:
Code:
1,$s/^ *\([a-z]\)/\1/
1,$s/^    *\([A-Z"']\)/\t\1/
1,$s/^ \([^ ]\)/\1/
SBT is offline   Reply With Quote