MobileRead Forums - View Single Post - Tools and methodology for easier proof-reading

SBT · 06-18-2012, 03:24 AM

I've wondered what's the best way of handling words split over lines when proofing OCR texts.
I use sed to get all of the word on one line, and then do interactive search&replace in an editor to remove soft hyphens.
I also use sed to automatically detect chapter headings and any subtitles, page headers, page numbers, and paragraphs.

06-18-2012, 03:24 AM	#13
SBT Fanatic Posts: 580 Karma: 810184 Join Date: Sep 2010 Location: Norway Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad	I've wondered what's the best way of handling words split over lines when proofing OCR texts. I use sed to get all of the word on one line, and then do interactive search&replace in an editor to remove soft hyphens. I also use sed to automatically detect chapter headings and any subtitles, page headers, page numbers, and paragraphs.