View Single Post
Old 05-18-2010, 02:09 PM   #4
aidren
Edge User
 
I've been looking into this some more. I have taken 3 books:

1) the one Rakista put up (http://www.archive.org/details/originsoftotalit00aren) which is an image scan with hidden text. This one rendered completely blank pages on the EE.

2) one with set text (fonts), images and no hidden text layer. This one rendered the text on the EE but not the images.

3) This one was the most complex. It had set text (fonts) with embedded inline images (ie. math symbols and equations) as well as a hidden text layer and images. The text and imbedded images rendered on the EE but the images did not.

All of these pdf files were PDF v1.6 Acrobat 7.x created by various and different applications — Internet Archive/Luradocument PDF v2.28; PDF Creator v9/AFPL Ghostscript v8.5.3; Acrobat 7/Acrobat 7 Paper Capture.


All of these files were created as PDF v1.6 (Acrobat v7), which according to Adobe is backward compatible to Acrobat and Adobe Reader 5.0.


So, my procedure was this — I opened each book in Acrobat Pro 8 (mac) and reprocessed them through Preflight to create three different versions of each — v6, v5 and v4 — and then moved them to the EE. The results were this.


1) Acrobat v6 yielded exactly the same results for all of the books as the original v7 formats (blank pages, missing images).


2) Acrobat v5 worked for all three. Although with Rakista's book on the EE, I was not able to select just one word of text, it kept selecting the whole line?? This whole book was scanned in color, so the contrast isn't that great, but I can't really see how that could affect text selection??


3) Acrobat v4 worked for Books 2 and 3, but for Book 1, it stripped the hidden text layer.


I also looked at two other files. One that has always rendered correctly on the EE, was created as Acrobat 7.x with Gsview/AFPL Ghostscript v8.5.4. This one has a fairly complex layout — text in various sizes and column widths with images. I think it would have had to have been scanned (with ocr), layered, merged and flattened. There were only two pages with hidden layers (with basically nothing on them) that I think just got missed when the hidden layer was being stripped.


The other was one I created myself, which is an image layer with a hidden text layer, but it has a font extraction/corruption issue happening with it. It works fine in Acrobat but doesn't render on the EE, no matter what I do with it, so I am assuming it is the font issue that is causing the problem with this one.


So, all of this is suggesting that Acrobat v7.x is NOT ALWAYS backward compatible to v5. I seems to have something to do with how the layers are being handled.


At any rate, the workaround seems to be to reprocess them to Acrobat v5.