Quote:
Originally Posted by drogo
I was wondering why there's such a preoccupation with trying to get everything into PDF as opposed to TXT or HTML?
|
TXT is clearly impossible as soon as you have characters that can't be represented in that format. Unicode seems to be the only reasonable format, but even that fails in certain cases. (One of which is chess: I can't represent a chess diagram well in a Unicode-encoded text. Math, music and labanotation are others.)
HTML has similar problems -- once the markup doesn't cover what you want to do. Again, if you rely on typefaces, HTML doesn't promise anything -- it's the reader that makes those decisions. And if you allow add-on markup, you have to ensure that the reader has the same add-on, or at least one that uses the same semantic as yours.
Both formats are far more flexible when it comes to selecting alternative typefaces, page sizes, etc, but they are completely useless when it comes to create a well-set page. PDF, on the other hand, allows that at the expense of flexibility. Take headings in all caps: they have to spaced more than usual text in order to be well readable. HTML doesn't do that -- although it (or any similar format that decided to allow that particular rendition to be specified) could do it.
PDF makes it far more likely that 'what-you-see-is-what-I-want-you-to-see' -- as any page-description languages should do.
Quote:
It's my understanding that PDF won't reflow the text, at least, I've never seen a document that would, while most of the other ebook formats will.
|
Again, PDF is a page description format, and therefore final -- you don't reflow those. Adobe's attempt to allow reflowing to make it possible to read PDF on handhelds was, I think, somewhat brain-damaged. They probably caved in to market pressure -- so see it as an attempt to educate the market that You Do Not Reflow PDF. As far as Adobe is concerned, I suspect you might as well steal sheep.
Adobe Reader does reflow ... but it doesn't do it without reason. If the page already fits your screen, it doesn't even try (though I'm not certain about exact conditions for reflowing). I'm more successful in forcing a reflow when I use Adobe Reader for a small-screen handheld. But as it removes carefully selected line spacing, and packs lines tight, the result is not acceptable.
Quote:
It's also difficult to extract anything from it. I know that every time I get a technical doc/book in PDF, I immediately begin trying to convert it to something else, usually HTML, but sometimes plain old TXT. PDFs just seem to be so much trouble.
|
This is more a question of what version of PDF you are using, and how competent the creator was. Modern PDF versions allow tagging, and if the creator enables it, you may be able to save text (at least) straight off. Without such tagging, there's not information enough to do it well -- and particularly not if special typefaces with unknown code tables have been used. Tagging is partly useful for reflowing, partly for increasing accessibility (it allows screen reading -- but even that has to be designed by the creator as soon as there's any degree of complexity in the document.) Don't blame this on the format -- it's the creator who hasn't done his work. (Well, you may be using a PDF reader that just cannot do it, of course.)
Quote:
I'm not trying to start a war or anything, just genuinely curious as to why so many people are heading to PDF, rather than from it. Maybe I'm missing some advantage of PDF?
|
My own decision was made when I discovered that there were no chess problem books on-line. Problem nr 1, of course, was how to ensure that the chess diagrams included would be readable. TXT doesn't do it, HTML doesn't do (unless I require the reader to buy the same typeface I used, or use some free and sub-standard face, which I won't). DejaVu would do it, but (at that time at least) it required a rather expensive license. TeX (i.e. DVI) again requires the reader have a suitable chess fonts. PDF was my answer.
(MS Word also allows for font embedding, but it has a nasty habit of breaking pages according to what printer you have -- which means that my page 345 may not be your page 345. Not acceptable -- so Word was not an option to me.)
Another factor that influenced the choice was that I am able to decide page layout: I can ensure high-quality hyphenation and other typographical tweakings -- such as ensuring that footnotes are on the same page as the text that refers to it. No non-page-description-format reader I know does that -- which to my mind is an indication that they don't really care about the reader, or shy away from technically difficult issues. (There are some signs that this may improve, though.)