saurabh Morankar
11-20-2009, 02:53 AM
Hi! I am a graphic designer. Currently I was handed over the project of converting some pdf files to Epub format.

I bought a PDF to EPUB conversion software but I am facing a lot pf problems when the file is converted to Epub format.

I use Adobe Digital Editions to view the Epub file.

The Problems are
(1) The alignment goes Kaput and multiple pages are displayed in a single window.
(2) Also, many characters are replaced by some junk characters.

What My Question is:
(1) Are there any specific settings that need to be done in the PDF before converting the file to EPub format?

11-20-2009, 04:08 AM
I have not found a decent PDF->ePub converter. But a couple of points:

(1) You may have to do considerable hand-editing to get the format straight. Sigil can be very useful for this.

(2) Some of the "junk" characters may not be actual junk but instead special characters. E.g., ligatures such as "fi" may appear in a text editor as bizarre symbols, but will render just fine in an ePub viewer. Of course if they also appear as junk in an ePub viewer, you have a different problem...

11-20-2009, 07:40 AM
Yes, make the .pdf so that it embed the original document, then when it's time to process the .pdf to make a .epub, extract the original document and work from that.


11-22-2009, 05:38 PM
If at all possible, try like hell to get a hold of the original files that made the PDF. Converting from PDF is a long hard laborious process.

The best way to convert is to use Adobe Acrobat Professional to convert the document into HTML. Then a/b check the PDF vs. the HTML. Then once you have finally cleaned up the HTML, you can convert it to ePub from there.

11-24-2009, 02:44 AM

(1) Can you suggest a good HTML to EPUB Converter?
(2) if we want to link the footers and create indexes, how do we do that?
(3) How can I create custom chapter headings and link them to the respective page so that the browser (Digital Editions) will jump to that specific page?

11-25-2009, 08:40 AM
Calibre works well for convert HTML to ePub.

11-26-2009, 04:41 AM
Would converting the PDF with OCR to html or RTF have better results? This only for the first step of course. For HTML to epub there are various good ways.

11-27-2009, 02:51 PM
Whatever you do, don't use RTF as in intermediate format. HTML is the best intermediate format before going ePub.

11-29-2009, 06:09 AM
I agree, HTML is the best intermediate format. I will do some test with OCR, from what I remember it handles paragraphs rather well.

12-04-2009, 05:10 PM
Get from PDF to HTML as fast as you can. Some PDF books are hell to convert. Easy are books with large blocks of text, like philosophical texts. Computer books are hell: every little change in spacing gets a style="...." in the html. Stripping some of it may result in a 50% file size.
I use UltraEdit to edit the HTML.
It sometimes may be faster, I suspect, to start an empty HTML file and cutting and pasting in the blocks of text fro the PDF.
A 500page philosophical text took me nearly two weeks (it being a first try for me)...
ABBYY Finereader is a good OCR program, but still: a lot of editor work afterwards.
It can OCR PDF files to text-file..