View Full Version : best converter from pdf to html ?


NASCARaddicted
02-04-2010, 09:39 AM
Hello

I have a few pdf files here that I want to turn into epub. But I want to/have to edit them first, so I need the content in html.

In the past, I used the Mobipocket Creator to convert the files, but I thought there has to be something better, because the html files of the Mobipocket Creator sometimes do have some odd errors ...

For example: I opened the html file in my editor. Most of the lines looked o.k. but some lines of the html file had just one word.

So
some
sentences
looked
like
this.

This always happens after about 100-200 html lines. Even more freaky is: sometimes the words are swapped:

looked
like
this.
So
some
sentences

And just yesterday I noticed that in the html file, there was a text line that made no sense. I looked at the pdf file and here the text line was much longer - in the html file some text was missing. I searched for it and found it - about 600 html lines below.

So what converter is the best ? And free, if possible. I would be willing to pay a little bit for it, but there is so many great, free software on the Internet, like Calibre or Notepad ++ ...

NASCARaddicted
02-11-2010, 06:28 AM
can no one tell me another converter ? So I guess I will continue with Mobipocket Creator

poshm
02-11-2010, 06:50 AM
I've only used mobipocket Creator and have had good results.

There are a ton of PDF to HTML programs if you look on google so a. It overwhelming. It might be an idea to try a site like cnet which reviews products:

http://download.cnet.com

jackie_w
02-11-2010, 10:47 AM
can no one tell me another converter ? So I guess I will continue with Mobipocket Creator

Calibre itself produces HTML as an intermediate step when converting your PDF to EPUB etc.

Switch on the Debug option during the Calibre conversion, [Convert] - [Debug] and specify a directory to receive the Debug files.

Once the conversion is finished ignore the EPUB and look in the Debug directory.

There are 4 subdirectories (Input, Parsed, Structure, Processed) each holding HTML at various stages of conversion. Look at each and decide which one suits your purpose best. Then edit to your heart's content before reimporting cleaned-up HTML into Calibre for "proper" conversion.