|
|
View Full Version : Why the fascination with PDF?
drogo 10-18-2006, 03:39 PM I was wondering why there's such a preoccupation with trying to get everything into PDF as opposed to TXT or HTML?
It's my understanding that PDF won't reflow the text, at least, I've never seen a document that would, while most of the other ebook formats will. It's also difficult to extract anything from it. I know that every time I get a technical doc/book in PDF, I immediately begin trying to convert it to something else, usually HTML, but sometimes plain old TXT. PDFs just seem to be so much trouble.
I'm not trying to start a war or anything, just genuinely curious as to why so many people are heading to PDF, rather than from it. Maybe I'm missing some advantage of PDF?
NatCh 10-18-2006, 03:43 PM I know that every time I get a technical doc/book in PDF, I immediately begin trying to convert it to something else, usually HTML, but sometimes plain old TXT. PDFs just seem to be so much trouble.
:shrug: I always assumed that what you just said was the issue: that a lot of stuff comes in PDF already.
Well, that, and the fact that it is easy to pass a PDF to someone else and know that they'll be able to read it.
Personally, I'm in the preferrin' RTF camp myself. :beam:
But a lot of things can't be gotten in that format. :(
Bob Russell 10-18-2006, 03:59 PM I think PDF also gives you more control of the way the book looks on the screen in terms of layout. If you want to go the extra mile (and know what you are doing), you can probably really make the ebook look amazing if it's customized for that device.
I have less sophisticated taste, so I like RTF also. :)
astfgl 10-18-2006, 04:07 PM PDF is the only working format on the Iliad, and even it's not working too well yet.
99% of my documents and ebooks are HTML, but I can't use them as the HTML renderer in the Iliad is broken.
The two main issues are in-document navigation and rendering. The rendering fault is the worst, and means that the top and bottom lines are unreadable much of the time, with no method of scrolling the page to make them fully visible.
The renderer should test if the page's start and end line are fully visible, and if the top one isn't, then move the page down slightly until it is, and if the last line is not visible, don't render it at all and make that line the top line of the next page.
drogo 10-18-2006, 04:30 PM Ahh, ok. I wasn't aware that PDF was all that worked. :) That's a good reason for everyone converting to it.
I do find it surprising that even simple TXT doesn't work yet, though. Odd. Ah well.
CommanderROR 10-18-2006, 04:59 PM astfgl got it right...I would love to use HTML, but since it doesn't work properly yet, I have to convert everything to PDF...^^
flumbo 10-18-2006, 06:55 PM I agree with the original poster. PDF is made for, and should mostly be used only for documents to be printed or where it is crucial to maintain the original layout. It just wasn't meant for convenient on-screen reading.
PDF is the only working format on the Iliad, and even it's not working too well yet.
99% of my documents and ebooks are HTML, but I can't use them as the HTML renderer in the Iliad is broken.
The two main issues are in-document navigation and rendering. The rendering fault is the worst, and means that the top and bottom lines are unreadable much of the time, with no method of scrolling the page to make them fully visible.
The renderer should test if the page's start and end line are fully visible, and if the top one isn't, then move the page down slightly until it is, and if the last line is not visible, don't render it at all and make that line the top line of the next page.
Do you have a html page that demonstrates this...? I've converted a number of chm files to html for the Illiad, and so far I haven't noticed this problem... Yes, there are known issues with it but I wouldn't say its "broken"... And if there are known issues then let Irex know about them so they can be fixed...
I'm also surprised with everyones fascination with pdf. Yep, its handy as many documents come on pdfs, but as an on-screen format its very static, and its a very dumb format. (Or am I missing somekind of PDF scripting language...?) :blink:
I was wondering why there's such a preoccupation with trying to get everything into PDF as opposed to TXT or HTML?
TXT is clearly impossible as soon as you have characters that can't be represented in that format. Unicode seems to be the only reasonable format, but even that fails in certain cases. (One of which is chess: I can't represent a chess diagram well in a Unicode-encoded text. Math, music and labanotation are others.)
HTML has similar problems -- once the markup doesn't cover what you want to do. Again, if you rely on typefaces, HTML doesn't promise anything -- it's the reader that makes those decisions. And if you allow add-on markup, you have to ensure that the reader has the same add-on, or at least one that uses the same semantic as yours.
Both formats are far more flexible when it comes to selecting alternative typefaces, page sizes, etc, but they are completely useless when it comes to create a well-set page. PDF, on the other hand, allows that at the expense of flexibility. Take headings in all caps: they have to spaced more than usual text in order to be well readable. HTML doesn't do that -- although it (or any similar format that decided to allow that particular rendition to be specified) could do it.
PDF makes it far more likely that 'what-you-see-is-what-I-want-you-to-see' -- as any page-description languages should do.
It's my understanding that PDF won't reflow the text, at least, I've never seen a document that would, while most of the other ebook formats will.
Again, PDF is a page description format, and therefore final -- you don't reflow those. Adobe's attempt to allow reflowing to make it possible to read PDF on handhelds was, I think, somewhat brain-damaged. They probably caved in to market pressure -- so see it as an attempt to educate the market that You Do Not Reflow PDF. As far as Adobe is concerned, I suspect you might as well steal sheep.
Adobe Reader does reflow ... but it doesn't do it without reason. If the page already fits your screen, it doesn't even try (though I'm not certain about exact conditions for reflowing). I'm more successful in forcing a reflow when I use Adobe Reader for a small-screen handheld. But as it removes carefully selected line spacing, and packs lines tight, the result is not acceptable.
It's also difficult to extract anything from it. I know that every time I get a technical doc/book in PDF, I immediately begin trying to convert it to something else, usually HTML, but sometimes plain old TXT. PDFs just seem to be so much trouble.
This is more a question of what version of PDF you are using, and how competent the creator was. Modern PDF versions allow tagging, and if the creator enables it, you may be able to save text (at least) straight off. Without such tagging, there's not information enough to do it well -- and particularly not if special typefaces with unknown code tables have been used. Tagging is partly useful for reflowing, partly for increasing accessibility (it allows screen reading -- but even that has to be designed by the creator as soon as there's any degree of complexity in the document.) Don't blame this on the format -- it's the creator who hasn't done his work. (Well, you may be using a PDF reader that just cannot do it, of course.)
I'm not trying to start a war or anything, just genuinely curious as to why so many people are heading to PDF, rather than from it. Maybe I'm missing some advantage of PDF?
My own decision was made when I discovered that there were no chess problem books on-line. Problem nr 1, of course, was how to ensure that the chess diagrams included would be readable. TXT doesn't do it, HTML doesn't do (unless I require the reader to buy the same typeface I used, or use some free and sub-standard face, which I won't). DejaVu would do it, but (at that time at least) it required a rather expensive license. TeX (i.e. DVI) again requires the reader have a suitable chess fonts. PDF was my answer.
(MS Word also allows for font embedding, but it has a nasty habit of breaking pages according to what printer you have -- which means that my page 345 may not be your page 345. Not acceptable -- so Word was not an option to me.)
Another factor that influenced the choice was that I am able to decide page layout: I can ensure high-quality hyphenation and other typographical tweakings -- such as ensuring that footnotes are on the same page as the text that refers to it. No non-page-description-format reader I know does that -- which to my mind is an indication that they don't really care about the reader, or shy away from technically difficult issues. (There are some signs that this may improve, though.)
PDF makes it far more likely that 'what-you-see-is-what-I-want-you-to-see' -- as any page-description languages should do.
So... Is the Illiad a printed page, or is it a screen...? :D At the moment I see the Illiad a device capable of displaying a page, which is why I favour screen based solutions, ie HTML... HTML is going to be much more flexible mainly because it is display agnostic, so it may not be as good as a custom document but will be better for more devices...
scotty1024 10-19-2006, 08:08 AM On the iLiad HTML is essentially broken. But even if it were working I find most HTML files to be overly fascinated with linking to be comfortable for reading as a "book". I prefer the Baen LIT files over the Baen HTML files for that reason.
For purchased content my favorite format is still Microsoft Reader.
For personal use I have no use for re-flow. I pick my font, I pick my size and I layout the content. Right now the only iLiad format that lets me pick my font my size is PDF.
On the iLiad HTML is essentially broken.
How so...? Do you have any specific examples...? So far I seem to be doing quite well with the html browser included...
tribble 10-19-2006, 08:49 AM How so...? Do you have any specific examples...? So far I seem to be doing quite well with the html browser included...
Have you tried reading a novel in HTML?
- Last line gets cut off, no overlap on page flip, so you basically have to guess a line every pageturn.
- depending on how long your file is, you cant stop reading in between, unless you keep the iLiad running and leave the html application open.
that already disqualifies totally as a book reading programm.
For reference files and short html pages you can easily read in one session, the html viewer does fine.
kusmi 10-19-2006, 08:50 AM I bought the iLiad exclusively to read PDFs, you can make pdfs much more good looking (layout, pictures, etc) than HTML (ok, you could do: by just putting a full-page image on it :-) )
If I would go for HTML/TXT, then I would have bought an Sony, because for those formats, the screen-size does not really matter, you just have to flip pages more frequently.
Furthermore, I started to read those free daily newspapers on my iLiad, and they are only available as PDFs...
Also on my Mac, I can "print" to pdfs from any application and have full control over the layout and even the font type-face (as pdfs embed fonts, which are non-standard)
yokos 10-19-2006, 09:08 AM The real problems start in an academic natural science environment. Your documents are full of mathematical equations.
# txt isn't usefull in any form [Please have a look at usenet then you will see how compicated it is to make clear for the reader how the equation looks like. :happy2: ]
# In html you can embed equations only as images [mostly gif] or with additional plugins.
# so pdf is the best format for these documents. [LaTeX->pdf]
Edit: Can somebody verify that iLiad doesn't open html files which are large in size [let me say 3 MB or so]?
- Last line gets cut off, no overlap on page flip, so you basically have to guess a line every pageturn.
Will double check that as last night I was reading a book converted from chm... Pretty sure I would've noticed that...!
- depending on how long your file is, you cant stop reading in between, unless you keep the iLiad running and leave the html application open.
Yes, that would be annoying... I've tended to split large pagers up to avoid this...! :D
scotty1024 10-19-2006, 09:59 AM Have you tried reading a novel in HTML?
- Last line gets cut off, no overlap on page flip, so you basically have to guess a line every pageturn.
- depending on how long your file is, you cant stop reading in between, unless you keep the iLiad running and leave the html application open.
that already disqualifies totally as a book reading programm.
For reference files and short html pages you can easily read in one session, the html viewer does fine.
Unless you can't get the font to scale properly. I've pretty much given up on RFC's in HTML on the iLiad, especially after 2.7 shook up Minimo's font system. I'm re-writing the back end of my RFC parser to emit PDF's.
I've also gotten trapped a couple times. Some HTML files assume you can always hit the "back" button or otherwise work yourself out of a page. If you close Minimo while on one of those pages you find yourself unable to navigate out of there except by manually editing the manifest.xml to reset the last location. :(
Minimo also does a poor job of re-flowing some HTML pages. The page flows off the right hand side of the display and you have no way to view it.
tribble 10-19-2006, 10:08 AM for making the iLiad really useful when reading reference material, you would need to have several "books" open at the same time and easily and fast switch between them.
for making the iLiad really useful when reading reference material, you would need to have several "books" open at the same time and easily and fast switch between them.
I think Irex's message is clear... Get an Illiad for each reference book...! Then switching between them is *easy*...! :D
tribble 10-19-2006, 12:28 PM I think Irex's message is clear... Get an Illiad for each reference book...! Then switching between them is *easy*...! :D
They just want us to get that star trek feeling :)
scotty1024 10-19-2006, 05:36 PM Actually some time around Star Date 3274 they are planning on letting us re-assign the short cut buttons so we could have 3 quick switch book buttons.
So... Is the Illiad a printed page, or is it a screen...? :D At the moment I see the Illiad a device capable of displaying a page, which is why I favour screen based solutions, ie HTML... HTML is going to be much more flexible mainly because it is display agnostic, so it may not be as good as a custom document but will be better for more devices...
I'm not sure I understand your distinction. When it comes to PDF, the best result is obtained when you regard the iLiad as the printer -- it accepts 'paper' of a particular size, closely related to its screen size. Using other output formats will need scaling or pan-and-scan solutions, just as when you try to print a legal-sized document on a letter-sized page -- You just don't do that -- if you must have legal-sized documents, you get a legal-sized printer: common sense, really. Anything else will be a compromise, useful only for particular situations.
HTML is a useful alternative, of course, even if it's not completely display agnostic. (A competent HTML coder is able to stay away from those areas, though.) Once you need to display music, chess, math, you're reduced to bitmaps ... which need to be created outside HTML, and also make certain assumptions about available display width. "Illustrated books" in HTML, which there are some on the web, cannot be shown on a very small screen -- the illustrations are too large.
You select the solution that fits your quality targets -- simple enough.
Once you need to display music, chess, math, you're reduced to bitmaps...
Music, chess and maths have glyphs in unicode... And music and math notification have xml schemas to support them. Just need to have them implemented on the Illiad... :D
Unless you can't get the font to scale properly. I've pretty much given up on RFC's in HTML on the iLiad, especially after 2.7 shook up Minimo's font system. I'm re-writing the back end of my RFC parser to emit PDF's.
Isn't that because you are using PRE tage...? I get the same problem in Firefox if I make the window small enough.
I've also gotten trapped a couple times. Some HTML files assume you can always hit the "back" button or otherwise work yourself out of a page. If you close Minimo while on one of those pages you find yourself unable to navigate out of there except by manually editing the manifest.xml to reset the last location. :(
Isn't a rule of web designer to never rely on a "back" button...?
Minimo also does a poor job of re-flowing some HTML pages. The page flows off the right hand side of the display and you have no way to view it.
Seen this bug... When I have a sec I'll make sure Irex know about it...
scotty1024 10-20-2006, 09:37 AM Things have taken a turn for the worse in 2.7. <pre> just means fixed space font and respect CR/LF.
Rules in web development?
Thanks, I needed a good hard laugh this morning. :)
Music, chess and maths have glyphs in unicode... And music and math notification have xml schemas to support them. Just need to have them implemented on the Illiad... :D
Music? Are you referring to the characters in the U-2600-267F range? How can I use them to make a saxophone part in a big band score? I can't.
Chess? Same place -- and I can use those that are there for so-called figurative notation, but how do I set a chess diagram? I can't get by without those. No way to do that.
Math? Much the same thing -- show me how I, in HTML, set up, say, an integral or a summation over a specified domain. It needs more markup than HTML has.
XML schemas ... is probably the way to do it, but going that way a) is tantamount to an admission that the material could not be done in HTML (which is the point from which my response was made), and b) introduces the problem that the recipient must have a reading program that understands the markup. It's not enough to parse it syntactically: the semantics of the markup must be retained as well.
And there's always the problem of locating the relevant schema, along with suitable documentation of it. I know of none for chess -- those I know are crippled or undocumented -- though I can't say about math and music.
astfgl 10-24-2006, 06:06 AM Do you have a html page that demonstrates this...?
Ask and ye shall receive ... eventually. These pics are with 2.6, but I updated to 2.7 tonight and the problem is identical.
The problem is obviously caused by a dumb renderer and a font (height+linespace) which does not divide evenly into the screen height. The offset starts at the bottom of page one and gets worse till half the line is cut, then appears to get better until the page and line height coincide again, then repeats. These pics are from page 5 of the document.
If by chance your document has a font size and formatting which multiplies evenly to the content area height, you won't notice any problem. Extra vertical space for super/subscript, different font sizes, etc would give a more "random" look to the problem.
BTW, yes, I realise my font size is a bit small, I gave up on the format(and the Iliad) before finishing my document styles. My Iliad is sitting on a shelf until it becomes useable.
Ask and ye shall receive ... eventually. These pics are with 2.6, but I updated to 2.7 tonight and the problem is identical.
Thanks for this... Will make sure Irex see your sleuthing...!
arivero 10-24-2006, 07:09 AM Math? Much the same thing -- show me how I, in HTML, set up, say, an integral or a summation over a specified domain. It needs more markup than HTML has.
.
Here he is probably refering to MathML, a W3-aproved extension of html that actually is implemented in every mozilla browser (and a plug-in exists for Explorer). Problem is, the fonts are very dependent of the CSS style and sometime they get missed when printing the page. But the positional format, integral summations etc, does work. Although, the few users of this notation prefer to use itex2MathML translators.
Here he is probably refering to MathML, a W3-aproved extension of html that actually is implemented in every mozilla browser (and a plug-in exists for Explorer). Problem is, the fonts are very dependent of the CSS style and sometime they get missed when printing the page. But the positional format, integral summations etc, does work. Although, the few users of this notation prefer to use itex2MathML translators.
Yep, I was going to reply to his post on the weekend but never got around to it... Basically think of xml as the containement/transport method and css as the display method... Of course you need the correct schema and method of display, but the same goes with html + pdf...
|