View Single Post
Old 12-29-2010, 06:56 PM   #3
swr2408018
Enthusiast
swr2408018 will become famous soon enoughswr2408018 will become famous soon enoughswr2408018 will become famous soon enoughswr2408018 will become famous soon enoughswr2408018 will become famous soon enoughswr2408018 will become famous soon enough
 
Posts: 35
Karma: 501
Join Date: Jul 2007
Device: PRS-500
What's interesting about the dual (or more) comparison technique is that it demonstrably catches OCR errors that have escaped multiple passes of side-by-side ebook and physical book examination. Where it would fail, of course, are when both OCR outputs have the same error in the same place. At some point, simply doing additional OCR passes on additional instances of the text will have diminishing returns, but you may still not be down to 0 defects.

It would be interesting to perform this experiment: find a text with lots of good scans at the Internet Archive, then track the number of additional defects found with each additional comparison, and then engage multiple sets of eyes to look for remaining defects after that process is complete.

Steve
swr2408018 is offline   Reply With Quote