View Single Post
Old 12-30-2008, 07:42 AM   #3
DDHarriman
Guru
DDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura about
 
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
Hi harryE123

The thing has to do with the type of OCR you are doing, so let me ask:
1 - what is the version of Acrobat are you using?
2 - when you do OCR in Acrobat, what option do you choose?

Using Acrobat 8 and choosing OCR Text Recognition and then Recognizing Text Using OCR one gets a default window where he can edit the settings.
If one does so, he gets 3 options, the important one is PDF Output Style, and from the 3 options one gets here, 2 produce one type of PDF and the past one (Formatted Text and Graphics) produces a different type of PDF.

So:

1 - the first 2 options (beginning with Searchable…) produce a PDF with 2 layers, the first layer is an image on the page, the second sits under the first, hidden from view and contains the text, positioned exactly in the same place where the “upper” image shows the text.
The result for the user is, he sees and image when viewing the PDF, but he can select the text and copy it, also he can (per example) find a word/frase in the document, etc…

2 - the last option, gives just one layer with text and images. These images are, real images in the original scan, like tables or photos, Acrobat could identify as images and all the letters Acrobat had doubts about. One can get rid of these doing a proof reading. This can be done by choosing again OCR Text Recognition and then Find First OCR Suspect (or find all), then one gets a window with the first one and a proposition for the text to be, one can accept or correct, once corrected the image is substituted by the correct letters, and Acrobat jumps to the next situation… and this goes on up to the end of the PDF.

Finally, there is the problem with images.
The Biggest impact optimizing has in the file is by compressing images, so if one has PDF’s (even by OCRing) of type (1), and lowers the resolution of the original up image layer, the file drops dramatically in size but at the cost of image quality. This happens also if one lowers the quality too… just like if one lowers the compression on a digitized photo.
DDHarriman is offline   Reply With Quote