MobileRead Forums - View Single Post

dwig · 01-05-2005, 11:14 AM

PDF is both very good and horrid, depending on what you want to do with it. Adobe has _never_ wanted it to be anything other than an final display format; they have never wanted it to be any form of interchange format. They have, over the years since the beta test days (I was prersonally involved for a short time during the later beta cycle of the original release), gradually and grudgingly yielded to users' desires to use it as a data interchange format despite the fact that PDF's core PostScript based architecture makes this _extremely_ difficult.

If you are looking for a final output format to display complex graphic intensive and layout controled documents then PDF is a very good choice and often the best choice. You can easily create an electronic document that looks exactly like the printed document. This is what PDF was designed for.

On the other hand, if you want to port a document to some other format, using a PDF as either source material or as an intermediary format is unwise. It should be used in such a workflow iff there is no other choice and, even then, it should be done with the foreknowledge that it will be a difficult and bumpy road. The basic structural design of PDF make automated conversion tools virtually impossible to design. The degree of artificial intellegence they must posses is beyond practical implimentation.

I've worked as one of the principle software designers on several projects at Macromedia to import and export PDF's in FreeHand. FreeHand's spacial layout orientation made reading and converting PDF's much easier than a linear beginning-to-end orientated word processing or ebook document would. Still, we had great difficulty assembling the various data chunks in the PDF into FreeHand type text blocks and graphic entities. More often than not, it was impossible to create code that could "think" about the page layout and decide which pieces should be assembled in what groupings and in what order.

The newer "tagged" PDF attributes help such importers to keep text in flowable text blocks but some source documents don't lend themselves to good automated construction of tagged PDFs during export and many PDF exporters don't have the option to generate such tagged PDFs. As a result, most PDF's found "in the wild" can't be reliably converted to any linear flowing format without extensive human interaction. Also, Adobe designed this tagging only as a tool for its Reader to use when needing to reflow a document displaying it on devices with limited display real estate. They were not designed to ease conversion to other formats and their design is, as a result, not optimized for such use.

01-05-2005, 11:14 AM	#6
dwig Wizard Posts: 1,613 Karma: 6718541 Join Date: Dec 2004 Location: Paradise (Key West, FL) Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...	PDF is both very good and horrid, depending on what you want to do with it. Adobe has _never_ wanted it to be anything other than an final display format; they have never wanted it to be any form of interchange format. They have, over the years since the beta test days (I was prersonally involved for a short time during the later beta cycle of the original release), gradually and grudgingly yielded to users' desires to use it as a data interchange format despite the fact that PDF's core PostScript based architecture makes this _extremely_ difficult. If you are looking for a final output format to display complex graphic intensive and layout controled documents then PDF is a very good choice and often the best choice. You can easily create an electronic document that looks exactly like the printed document. This is what PDF was designed for. On the other hand, if you want to port a document to some other format, using a PDF as either source material or as an intermediary format is unwise. It should be used in such a workflow iff there is no other choice and, even then, it should be done with the foreknowledge that it will be a difficult and bumpy road. The basic structural design of PDF make automated conversion tools virtually impossible to design. The degree of artificial intellegence they must posses is beyond practical implimentation. I've worked as one of the principle software designers on several projects at Macromedia to import and export PDF's in FreeHand. FreeHand's spacial layout orientation made reading and converting PDF's much easier than a linear beginning-to-end orientated word processing or ebook document would. Still, we had great difficulty assembling the various data chunks in the PDF into FreeHand type text blocks and graphic entities. More often than not, it was impossible to create code that could "think" about the page layout and decide which pieces should be assembled in what groupings and in what order. The newer "tagged" PDF attributes help such importers to keep text in flowable text blocks but some source documents don't lend themselves to good automated construction of tagged PDFs during export and many PDF exporters don't have the option to generate such tagged PDFs. As a result, most PDF's found "in the wild" can't be reliably converted to any linear flowing format without extensive human interaction. Also, Adobe designed this tagging only as a tool for its Reader to use when needing to reflow a document displaying it on devices with limited display real estate. They were not designed to ease conversion to other formats and their design is, as a result, not optimized for such use.