View Full Version : How to Do Everything with PDF Files


TadW
01-01-2009, 10:10 AM
The following article gives a good overview over what you can do with PDF files (without using the expensive Adobe Acrobat):

http://www.labnol.org/software/adobe-pdf-guide-tutorial/6296/

Nate the great
01-01-2009, 10:24 AM
good find

xianfox
01-01-2009, 10:30 AM
Thanks, some of that will come in handy at work.

JSWolf
01-01-2009, 10:45 AM
But I notice no good way to convert from PDF.

smithno
01-02-2009, 06:10 PM
But I notice no good way to convert from PDF.

PDF was designed as an output format. It will probably never be easy to manipulate.

RWood
01-02-2009, 07:57 PM
But I notice no good way to convert from PDF.
"Good" is a matter of conjecture Jon, the article suggests that "You can upload the PDF document to Zamzar and convert it any formats like doc, html, png, txt or rtf (rich text format). Alternatively, you can convert PDF to HTML using Gmail."

I have used ABBYY PDF Transformer 2.0, ABC Amber PDF Converter, Paperport, and several other packages over the years. There is not one solution for all cases and the correct choice depends on the specific PDF in question, the tools on your computer, what tools are currently available for free, what you tools you can get in a functioning trial copy, and how much money you are willing to spend on new tools.

While I am not the biggest fan of PDF for ebooks, PDFs have their place and I have created PDF files for the Sony Reader where I felt they were the best option.

alexxx
01-03-2009, 03:57 AM
too many of the options proposed in the article involve the uploading of your document to some server.
Call me paranoid, but I don't like at all this kind of "services" - I want my documents to stay on <my> server.
Apart from that, under linux (which is not mentioned at all in the article) software exists to do practically any kind of conversion you need.



alessandro

Flinx
01-03-2009, 05:52 AM
Apart from that, under linux (which is not mentioned at all in the article) software exists to do practically any kind of conversion you need.
alessandro

Really? I did search for one and have found no Linux program at all that tries to convert from PDF to floating text with attributes and with paragraph recognition. The only program that generates useful output I could find is PdfGrabber, but I am still interested in a better solution.

bookbinder
01-03-2009, 05:24 PM
I have a few scanned google books in pdf that I'm having a hard time converting to text, even following advice from the article. Has anyone done this successfully? I've tried:
-Zamzar (returns an unopenable doc file)
-Google mail (doesn't display pdf as html)
-Pdf2Word program

labnol
01-04-2009, 02:46 AM
I have a few scanned google books in pdf that I'm having a hard time converting to text, even following advice from the article. Has anyone done this successfully?

You can upload the scanned PDF files to a public web server, link those files from web page and then wait for google bots to index those PDF. See complete instructions (http://www.labnol.org/software/convert-scanned-pdf-images-to-text-with-google-ocr/5158/).

Flinx
01-04-2009, 07:52 AM
...wait for google bots to index those PDF.

The linked example shows why this way is essentially useless. The resulting text has line breaks on each line. A good converter for books has to try to set a line break only at the end of a paragraph.

tompe
01-04-2009, 09:59 AM
The linked example shows why this way is essentially useless. The resulting text has line breaks on each line. A good converter for books has to try to set a line break only at the end of a paragraph.

Really not true at all. You can also use the convention that two line breaks in a row indicates a new paragraph like TeX and LaTeX do. It is trivial to convert between the two conventions using some simple program or a one line script.

Flinx
01-04-2009, 02:24 PM
Really not true at all. You can also use the convention that two line breaks in a row indicates a new paragraph

No, that is not really useful for the most standard PDFs. The text object in a PDF file does not contain a real line break. It contains the position where on the page it has to drawn and a number of characters. The result is a line of text.
The progam that makes the conversion has to estimate from the positions of the text objects in which order the lines come. Simple converters like the most available (including Acrobat) use one text object, convert it to text and set a line break at the end, resulting in one line of the output text. The better converters can try to join the separate text objects, if their horizontal start position is identical and the line is long enough. But this is a difficult job, and I have not yet found a program that works good enough for me.

tompe
01-04-2009, 02:51 PM
No, that is not really useful for the most standard PDFs. The text object in a PDF file does not contain a real line break. It contains the position where on the page it has to drawn and a number of characters. The result is a line of text.
The progam that makes the conversion has to estimate from the positions of the text objects in which order the lines come. Simple converters like the most available (including Acrobat) use one text object, convert it to text and set a line break at the end, resulting in one line of the output text. The better converters can try to join the separate text objects, if their horizontal start position is identical and the line is long enough. But this is a difficult job, and I have not yet found a program that works good enough for me.

That might be the case but there is no functional different between encoding paragraphs with two line breaks or one. What you are talking about is how go a converter is detecting a paragraph break but that has no necessary connection to how the encoding is done. You can argue that you loose information if you do not keep the line breaks in a paragraph since they are impossible to recreate but it is trivial to take a paragraph specified by using double line breaks and convert it to one line.

stonehat
01-05-2009, 05:28 AM
From TFA:
"Most mobile phones can read PDF files."

I stopped reading after that.

millerjpmd
01-07-2009, 06:11 PM
Thanks for the find. I started a thread concerning a similar issue with PDFs. This is what I found related to converting from a PDF.

Programs that allow you to manipulate and extract info from PDF:
File Juicer ($17,http://echoone.com/filejuicer/)
deskUNPDF ($100,
http://www.docudesk.com/deskUNPDF_product_home.shtml)
PDFpen and PDFpenPro ($50-100, http://www.smileonmymac.com/index.html)

Program that allows you to join multiple pdfs into single file with Table of Contents:
PDF Lab (free, http://www.iconus.ch/fabien/products/pleng/pleng.html)

w/r to just getting the PDF into a PRS-505 calibre, for the most part, worked as well as any of these programs

Hope this helps.

jpm

BlackVoid
04-16-2009, 08:11 AM
When converting a PDF with pictures for an ebook device, I found a good method with minimal fuss. It is a bit time consuming and you need a 3rd party product.

Use ABBY Finereader to convert to LIT format, then convert the LIT to the ebook format of your choice. Pictures will be preserved. Abby Finereader takes a while to convert for its own format, but it will also handle scanned books. I have not tried 2 column PDFs, but an average PDF with pictures is OK.

I then use BookDesigner to convert from LIT to LRF and the result is quite good.

namiamy
05-31-2009, 04:53 AM
good find. thx.
i got more knowledge about adobe...

stranjer
07-25-2009, 04:59 PM
thanks for the trick BlackVoid, I'm gonna try this myself...

sEventoRii
04-08-2010, 07:52 PM
very useful!
thx for sharing.~`

kirby10
05-03-2010, 06:20 AM
I use the Calibre ebook management system. It has a reasonably good converter, and converts pdf to epub, among many others. The only problem I have is that I can't seem to enlarge the fonts, even after manually adjusting the conversion settings; and books converted in this manner do not respond to the font scaling commands on my e-reader (kobo).

Freeshadow
05-13-2010, 07:53 PM
(a squeak from a long-term silent reader)
how about: http://www.accesspdf.com/pdftk/

k0077
08-14-2010, 06:49 AM
i find that when i convert to PDF using Calibre when i put it onto my sony reader PRS-300, not all the pages upload, like its blank for half of teh book, but all the pages are there when you open it on my computer??? does anyone know how to fix this??? because most of my ebooks are pdf format??
also does anyone know a free pdf to word.doc creator that converts the whole book???

I really like odf but its so annoying when the whole book doesnt show??

or i dont know if its my ereader because all other formats seem to work??

help please

sam3168
01-18-2011, 04:49 PM
Thank you for the link. It's very useful.