MobileRead Forums - View Single Post

thibaulthalpern · 04-25-2009, 04:54 PM

Quote:

Originally Posted by pepak

Not really. The text is still a picture, but a PDF usually has an additional text-only layer that the readers can use for plaintext rendering. Unfortunately, this text layer is often just an uncorrected OCR of the image and can differ quite significantly from the "image representation".

Not entirely wrong but not entirely correct either.

It all depends on the way the PDF was made. One can make a scanned image of pages and then have OCR recognition done on the scanned text (which is essentially an image). Then, underneath that layer of image is plain text.

Another way is that the text itself actually is text and not an image. In this case, there would be native font embedded into the PDF. This is the preferable method.