Thread: 903 scanned pdf files
View Single Post
Old 08-24-2011, 09:00 PM   #9
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,058
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
The results of OCR can be often wrong, especially for non-text stuff like equations, tables, embedded images with text in them,... Replacing all the text with OCR results would produce a very bad copy in many cases. So, it seems best to display the original images, but underlay them with hidden text to allowing searching of the document. Bad OCR results then just mean missed search results rather than completely wrong text.

I don't know why the OCR results are partly visible in your case. I have some PDF files that have the hidden text under scanned images, and they work perfectly well on my 902. The text and equations show up clearly in the scan, and I can search for words via the hidden layer. The hidden text is not shown at all in the display. Maybe they made the hidden layer visible in your document because the original scan was really poor quality. Anyway, try finding and adding the missing fonts to the document. That is probably the easiest way to fix it.
rkomar is offline   Reply With Quote