View Single Post
Old 10-18-2011, 07:42 PM   #34
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 80,030
Karma: 147977995
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DiapDealer View Post
No, I assume it just uses the OCR text layer, but I could be wrong. I use Acrobat Pro a lot too, but it's always been a bit of a toss-up between it and other programs for me. I like that Acrobat will retain a lot of the styles when exporting, but if the page numbers and such (headers and footers) are not true adobe headers and footers (as is usually the case)... I still have to rely on external programs to strip them. And even then they're not truly "removed" from the PDF only hidden from view (and conversion programs will add them right back in to the mobi or epub.

So I usually have to decide between HTML with italics—but with pesky headers and footers to track down and remove (Acrobat). Or really nice, clean HTML with no pesky headers and footers, but no italics (PDFMasher). Both need regexed for paragraph fragments.
Acrobat Pro can handle the headers/footers just fine. All you need do is crop the pages so the headers/footers don't exist and then convert. That gets rid of them very well. Better then any other method.
JSWolf is offline   Reply With Quote