View Full Version : Best PDF conversion tool.


Dark123
03-18-2010, 11:22 PM
I have a PDF and I want to either convert it to HTML or DOC. What is the best tool that can do this?

frabjous
03-19-2010, 12:39 AM
Are they scanned PDFs or text-based?

For the former, search the forums for recommendations on OCR software.

For the latter, I don't know of any software that works particularly well, but I'd still probably stick with calibre (http://calibre-ebook.com/) before paying for anything else. (Calibre will convert to rtf, IIRC, which you can open in a word processor and save as DOC or HTML.)

Patricia
03-19-2010, 12:48 AM
Book Designer will also work for text-based PDFs. Then save as html. But it balks at image-based PDFS.
I tend to use Mobipocket Creator, which generally works rather well for text-based PDFs. After you load the file it makes an html version and stores it in "my publications."

For image-based PDFs I've invested in ABBYY Finereader 10. It's not cheap but does the job better than the competition, in my opinion.

Dark123
03-19-2010, 05:28 AM
It's just a normal book with a cover. So it has an image and it's mainly text.

Patricia
03-19-2010, 07:12 AM
Ah, but if someone has made a scan from a paper copy, then saved to PDF, it will be an image scan. On the other hand, if someone has converted text and a cover-image to PDF then you have the (relatively) easy-to-convert sort.

HarryT
03-20-2010, 01:39 PM
BookDesigner generally does a much better job of PDF conversion than does Calibre, in my experience. Plus, you will ALWAYS need to edit the result to clean it up, which you can of course do in BD, but not in Calibre.

Ralph Sir Edward
03-20-2010, 02:28 PM
If you don't eant to spring for the full ABBYY Finereader, you can buy the PDF coversion part spearately as PDF Transformer. It work fairly well...

chainring
03-20-2010, 03:40 PM
If you don't eant to spring for the full ABBYY Finereader, you can buy the PDF coversion part spearately as PDF Transformer. It work fairly well...Along the same line, I have to wonder why ABBYY doesn't give any indication as to the differences/similarities between FineReader and PDF Transformer. It seems, although just a cursory glance, that FineReader may do a better job with PDF to different format conversion. Any words on that, Ralph Sir Edward?

greenapple
03-20-2010, 07:45 PM
If price isn't an issue, then go for Adobe Acrobat. It does a good job of conversion from PDF to HTML, without the hassle of editing out headers, footers, and page numbers from the output.

Fat Abe
04-09-2010, 12:02 AM
Along the same line, I have to wonder why ABBYY doesn't give any indication as to the differences/similarities between FineReader and PDF Transformer.

The user can download trial versions of both programs. Since individual users have different requirements, just be sure you have some tough test files to run the program(s) against. The trial is very short lived (15 days, max of 50 pages, one page at a time). I feel certain I can eliminate the more expensive program with just a few pages. How does this look for conversion:

E(A, D) = DT [(Yi Yi)I YiY? D = DTM(A)D

The source is page 2 of

http://www.geometrictools.com/Documentation/LeastSquaresFitting.pdf

It's not the subscripts and superscripts that upset me. FineReader got most of them correct. It is the non-recognition or omission of the math fonts. This makes sampling Transformer, the next program in line, moot. Which leaves me with Acrobat.

For non-math/science uses, FR is not too bad. It flows the paragraphs correctly, recognizes italicized text, and identifies headings. Since I won't be evaluating Transformer, I'll leave that to another board member.

plim
04-09-2010, 07:05 AM
of those i tried so far, the best is abbyy pdf transformer, specialy for pdfs with images and graphics.

archerwoo
04-09-2010, 08:37 AM
can i transfer TEXT to PDF?

frabjous
04-09-2010, 08:59 AM
can i transfer TEXT to PDF?

You can do that with calibre, or with just about any word processor, or about a million other applications. But I think it would help to know a bit more about how you'd like these pdfs files formatted, if at all.

greenapple
04-09-2010, 07:53 PM
of those i tried so far, the best is abbyy pdf transformer, specialy for pdfs with images and graphics.
IMO, abbyy is terrific as an OCR tool for converting texts within images. But as a pdf (non image)-to-html or -text converter, I didn't have any good result with it at all. It will convert headers and footers of the PDF as part of the text, and it's a chore to crop the headers and footers within the abbyy program, because you have to do it manually, one page at a time.

I would rank the best 3 converters in terms of output to be used in an ebook reader, or to be used for conversion through Calibre, in this order:
1) Acrobat
2) Nuance
3) Nitro

I liked Nuance converter so much I purchased it. The price point is much more palatable than the Adobe's product.

Fat Abe
04-10-2010, 12:14 AM
Nuance had some pretty bad reviews here:

http://download.cnet.com/Nuance-PDF-Converter-Professional/3000-10743_4-10973909.html

When I see the word "crash", I shudder. Bad customer support is also scary.

Adobe Acrobat is a bear to use, but there are several books you can buy to master the program. Adobe will be around for a long time. Who knows how long Nuance will be in existence?

frabjous
04-10-2010, 09:36 AM
Nuance had some pretty bad reviews here:

Adobe will be around for a long time.

Not if Apple gets its way. Sigh.

Has anyone tried the PDF Import Extension (http://wiki.services.openoffice.org/wiki/Pdf_Import_Extension) for Open Office? I haven't, so I can't vouch for it, but I think it's always best to try the free tools first, before moving on to the expensive ones.

buecherhans
04-20-2010, 11:10 AM
I just got an email from Nuance offering a free PDF reader with a free online conversion offering. Converts to Word, Excel, RTF and WordPerfec. I have downloaded the reader and currently testing the online conversion.

Pretty surprising result. I will write a short test report. Should be ready by the weekend. Have to earn some bucks in the meantime.

buecherhans
04-20-2010, 03:14 PM
Nuance FREE PDF Reader with free online PDF converter


Test Report (short version)



Software: - Nuance PDF Reader
Company: - Nuance Communications, Inc.
Operating: - System Microsoft Windows (xp, vistas, 7)
see Tech Specs at Nuance.com
Browser needed for online conversion
Price: free
Download http://www.nuance.com/imaging/products/pdf-reader.asp


First of all, please excuse my English, I am not a native speaker.
I have a lot of pdf files on my hard drive that I always wanted to read, but found it to inconvenient to sit hours in front of a screen. Now that I have an eBook reader I need to convert the pdf into a resizable format. For that reason I need to extract the text into a html-file or some format that can be converted to html. Until now I used Mobipocket Creator, which I consider the best free conversion tool. I read that the Nuance converter got some bad reviews and it is just under EUR 100. Acrobat is too expensive for me.
Installation: Simple, you download an exe file, run the file, installation will be automatically (windows-like), run the software.
Conversion: First file a book in pdf.format (German text about 65 pages), very simple and plane, head lines, body text, footer with page numbers, standard fonts. You open the file from the Nuance PDF Reader, there is a buttom for online conversion in the task bar. After you hit the buttom your browser will open at a Nuance site with a from, you need to choose the conversion format, enter an email address and hit send. After a few minutes the converted file is in your mailbox. If you choose Word you will receive a docx-file. Nuance did a pretty good job in letting the converted text look like the original (which I do not care about, since I just want the content for the eBook reader). The footer with two long horizontal lines and the page number in-between was converted to a text box, which is inconvenient, since I will have to delete those manually for some 60 pages. Nuance recognized the page size, paragraphs, bold and italic text and put hard returns at the page breaks, which also have to be removed manually. So far so good. The result was very good, I got an editable Word document that looked almost identical to the original.

The second document (English text, 200+ pages) was a little more complicate, header and footer, graphics, some pages with 2 columns, forms and some artistic fonts in the chapter headlines and bullet lists. Same procedure, open document in the Nuance PDF Reader, looked perfect, I could select and copy any passage of text. So I sent it off to Nuance for conversion waited a few minutes and received the converted file in the mailbox, open the docx with Word. Big surprise.

Text passage - Result
2 column text - perfect
text in Box - perfect with the box frame
web and email addresses (no link in the original) - highlighted blue as a link
Header, footer - as regular text or Word text field (inconvenient to remove)
Bullet list - indented, but no bullets
graphics - in text like the original
forms - look very similar to the original
artistic fonts - Big surprise: missing in the converted text or with typical OCR errors. (????)

Throughout the text a lot of OCR errors like I = 1, m = rn, b <=> h. This really surprised me! Did Nuance use OCR to convert a pdf document? I could just copy and paste the problematic passages and got the right text. Well this was quite disappointing, but still this converted text, while needing some more editing was still usable. Mobipocket Creator did not have any problems with the artistic fonts but also had problems with this text.
On the other hand the Nuance result encouraged me to try another test. Since it appeared that Nuance used OCR I scanned 20 pages of a book with two book pages on one landscape A4 scan. This resulted in a 5 MB pdf-file but text as a bitmap. I send the file via Nuance PDF Reader for online conversion and an editable Word file was returned. A file with all the typical scanning errors, but editable in Word or OpenOffice.

Final result: Conversion by Nuance produces good results and you get a free Optical Character Recognition program online. Perfect for all those who do not have access to OmniPage or FineReader, now you have a free ORC online.

More results with pictures later!

greenapple
04-20-2010, 11:05 PM
Excellent review, buecherhans.

The footer with two long horizontal lines and the page number in-between was converted to a text box, which is inconvenient, since I will have to delete those manually for some 60 pages.

With the dedicated Nuance PDF converter you could save a bit of time by cropping the areas you want (ie sans headers/footers) before converting. This will produce a readable text without the bothersome headers, footers, lines etc. I'm not sure if the ability to crop is available in the free Reader program. :)

buecherhans
04-21-2010, 02:52 AM
That's a good idea to crop the text areas to be converted before sending off for conversion. I guess my underlying intention was a comparison with Mobipocket Creator, which strips header and footer (most of the time). You still have to keep in mind that the Nuance conversion is for a totally different purpose than the Mobipocket Creator.

I checked the Nuance Free PDF Reader this morning again and it appears that there is no cropping function in the free version. But that can be achieved with other software.

For me personally, the best result is to have a free OCR online. That is good news for all not having the need to spend some hundred Euros (Dollars) for a dedicated OCR software.

The next question is, if it would be possible to send a pdf-file to Nuance without running their free pdf reader. Since this is done through the web browser it should be possible. That would also give everybody not using Windows a chance to use online OCR software. Since I defected to Cupertino its always a hassle to run the irreplaceable Win software like OmniPage. "You can't always get what you want".

I'll give it a try today.