View Full Version : Reading PDF files


profnachos
11-18-2007, 09:59 PM
I am sure this has been asked a million times, so please bear with me.

What viable and inexpensive way is there to convert PDF to either HTML or DOC?

I have an ebookwise reader, and tried

- PDFReader: This converts to image files, which does not work for me. The resulting file is large and unreadable, not to mention that text related features do not work on images.

- pdftohtml: This converts to HTML, but it does not differentiate between paragraphs and line breaks, and the resulting text in the HTML is all jumbled up. Other conversion tools I have tried do the same.

Vesper
11-18-2007, 11:37 PM
I'm not sure what inexpensive means to you. The best so far - but not very cheap - is abbyy's pdf transformer. If you are not scared by 89 euro tag you can take a look...

DrS

kovidgoyal
11-19-2007, 12:14 AM
The output of pdftohtml can be easily further machine processed see for example pdf2lrf

rlauzon
11-19-2007, 04:07 AM
What viable and inexpensive way is there to convert PDF to either HTML or DOC?

Short answer: No.

Longer answer: All PDF to anything tools will require a great deal of manual editing on your part. This is because PDF simply doesn't store the information for certain things - paragraph breaks, for example.

tompe
11-19-2007, 06:22 AM
Put your pdf document on the net. Wait for some time. Search for your document in google and use "view as html".

profnachos
11-20-2007, 12:13 AM
The output of pdftohtml can be easily further machine processed see for example pdf2lrf

The problem is, I have ebookwise, not Sony.

kovidgoyal
11-20-2007, 12:40 AM
I just meant to give you an idea how to do it. Basically pdftohtml preserves line breaks using <br> elements. These need to be removed intelligently (based on line length) and two consecutive <br> elements become a new paragraph.

DaleDe
11-20-2007, 01:11 AM
I just meant to give you an idea how to do it. Basically pdftohtml preserves line breaks using <br> elements. These need to be removed intelligently (based on line length) and two consecutive <br> elements become a new paragraph.

Actually the double br should be replaced with \p p and the rest removed for the ebookwise. then maybe clean up the first and last paragraph manually.

Dale

jpathomas
11-24-2007, 11:52 AM
Has anyone tried reformatting PDF files to fit the Sony Reader using Acrobat? I have a number of technical documents (mostly certification materials) that I would like to convert for my own use on the Sony reader, but everything I've tried short of Acrobat produces unacceptable content. I've been considering buying a full version of Acrobat, but it's not inexpensive.

What I want to do is recreate the PDF documents I have so that the images are displayed in their correct position with regard to the text. This will require resizing the pages to fit the Sony's screen size, and possibly resizing the images also. At this point I don't know that this is possible.

DaleDe
11-24-2007, 12:04 PM
Has anyone tried reformatting PDF files to fit the Sony Reader using Acrobat? I have a number of technical documents (mostly certification materials) that I would like to convert for my own use on the Sony reader, but everything I've tried short of Acrobat produces unacceptable content. I've been considering buying a full version of Acrobat, but it's not inexpensive.

What I want to do is recreate the PDF documents I have so that the images are displayed in their correct position with regard to the text. This will require resizing the pages to fit the Sony's screen size, and possibly resizing the images also. At this point I don't know that this is possible.

The Acrobat Reader cannot reformat text. Generally it is already formated for the paper.

The approach for PDF's is to print the document to a particular pre-sized paper that is the size needed for Sony use. You use a PDF creation program as the printer device and set it up to use the paper size you want. There are plenty of these kinds of PDF creation tools out there. Some are free. This works with documents like word files and text that is essentially reflowable and can be conformed to the chosen page size. It does not work for documents that are designed for fixed page sizes which is typically the case for PDF's you encounter at work. If you can get the source files then you can do what you want.

Dale

vivaldirules
11-24-2007, 12:18 PM
What about using Acrobat (i.e., not Acrobat Reader)?

DaleDe
11-24-2007, 12:58 PM
What about using Acrobat (i.e., not Acrobat Reader)?

The full version of acrobat can do this and most anything else depending on the protection assigned to the files. It is a source editor.

Dale