View Single Post
Old 04-25-2009, 03:54 PM   #7
thibaulthalpern
Evangelist
thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.thibaulthalpern ought to be getting tired of karma fortunes by now.
 
Posts: 478
Karma: 451808
Join Date: Feb 2009
Location: California, USA
Device: my two eyes, KLiiK, Sony PRS-700
Quote:
Originally Posted by pepak View Post
Not really. The text is still a picture, but a PDF usually has an additional text-only layer that the readers can use for plaintext rendering. Unfortunately, this text layer is often just an uncorrected OCR of the image and can differ quite significantly from the "image representation".
Not entirely wrong but not entirely correct either.

It all depends on the way the PDF was made. One can make a scanned image of pages and then have OCR recognition done on the scanned text (which is essentially an image). Then, underneath that layer of image is plain text.

Another way is that the text itself actually is text and not an image. In this case, there would be native font embedded into the PDF. This is the preferable method.
thibaulthalpern is offline   Reply With Quote