MobileRead Forums - View Single Post - PDF to Kindle: The unobtainable Holy Grail of ebooks

JSWolf · 10-18-2011, 07:42 PM

Quote:

Originally Posted by DiapDealer

No, I assume it just uses the OCR text layer, but I could be wrong. I use Acrobat Pro a lot too, but it's always been a bit of a toss-up between it and other programs for me. I like that Acrobat will retain a lot of the styles when exporting, but if the page numbers and such (headers and footers) are not true adobe headers and footers (as is usually the case)... I still have to rely on external programs to strip them. And even then they're not truly "removed" from the PDF only hidden from view (and conversion programs will add them right back in to the mobi or epub.

So I usually have to decide between HTML with italics—but with pesky headers and footers to track down and remove (Acrobat). Or really nice, clean HTML with no pesky headers and footers, but no italics (PDFMasher). Both need regexed for paragraph fragments.

Acrobat Pro can handle the headers/footers just fine. All you need do is crop the pages so the headers/footers don't exist and then convert. That gets rid of them very well. Better then any other method.