Quote:
Originally Posted by HarryT
The important thing is not to try to convert PDF to other formats. PDF is not a "book" format - it doesn't contain paragraphs, sentences, or even words, and hence is extremely difficult to convert to other formats. You really need to use an OCR program and save your book as some "text" format, such as HTML or Word.
|
Harry, I usually know better than to mess with a Dalek, but your statement is only correct if you are referring to "image-only" PDFs. Most commercially produced PDFs do have text, but because it's a format that is intended to be printer-friendly, one needs to contend with headers, footers, and page numbers when converting to other formats.
As PRS-600 users might notice, some PDFs allow text re-sizing and re-flow, and others can only be zoomed. In the latter case we are dealing with image-based PDFs.
There are good reasons for doing such conversions. For the most part, the PRS-600 does a really nice job rendering and re-flowing most of the PDFs I need to read. That's one of the main reason I chose the Sony Reader as I understood it excels at this. But I've recently encountered PDFs that were made in Linux (I'm
not saying the weirdness was
because of Linux) that look fine when printed or viewed full-screen on a computer, but suffer from weird mid-word line breaks on the PRS-600. I've found that converting such books into HTML using MobiPocket Creator, stripping out the header and footer code using regex in a text editor, and then converting to EPUB or LRF in Caliber makes these former PDFs more comfortable to read. But I wouldn't go through all that trouble unless the PDFs were mis-behaving in the first place.
I've been delighted to learn about
some tools that can help make the image-only PDFs more convenient to view on Sony Readers, as well as do
quick PDF-PDF conversions to minimize whitespace. While I'm willing to OCR an image-only PDF to extract the text, sometimes it's more trouble than it's worth. And if you're dealing with non-Roman characters (a book on Classic Greek grammar, for instance) it is even more difficult.
There's a lot of different approaches to these problems, and what might be ideal for me could be unacceptable to someone else. I think it's best to know what all the options are, in any case. PDF conversion to other formats is but one of these options.
Cheers!