MobileRead Forums - View Single Post

rkomar · 08-24-2011, 09:00 PM

The results of OCR can be often wrong, especially for non-text stuff like equations, tables, embedded images with text in them,... Replacing all the text with OCR results would produce a very bad copy in many cases. So, it seems best to display the original images, but underlay them with hidden text to allowing searching of the document. Bad OCR results then just mean missed search results rather than completely wrong text.

I don't know why the OCR results are partly visible in your case. I have some PDF files that have the hidden text under scanned images, and they work perfectly well on my 902. The text and equations show up clearly in the scan, and I can search for words via the hidden layer. The hidden text is not shown at all in the display. Maybe they made the hidden layer visible in your document because the original scan was really poor quality. Anyway, try finding and adding the missing fonts to the document. That is probably the easiest way to fix it.

08-24-2011, 09:00 PM	#9
rkomar Wizard Posts: 3,058 Karma: 18821071 Join Date: Oct 2010 Location: Sudbury, ON, Canada Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633	The results of OCR can be often wrong, especially for non-text stuff like equations, tables, embedded images with text in them,... Replacing all the text with OCR results would produce a very bad copy in many cases. So, it seems best to display the original images, but underlay them with hidden text to allowing searching of the document. Bad OCR results then just mean missed search results rather than completely wrong text. I don't know why the OCR results are partly visible in your case. I have some PDF files that have the hidden text under scanned images, and they work perfectly well on my 902. The text and equations show up clearly in the scan, and I can search for words via the hidden layer. The hidden text is not shown at all in the display. Maybe they made the hidden layer visible in your document because the original scan was really poor quality. Anyway, try finding and adding the missing fonts to the document. That is probably the easiest way to fix it.