Quote:
Originally Posted by rraod
Though Acrobat Professional program is expensive, it has some very good conversion features.
Acrobat Professional will allow you to Save a good pdf in to a HTM format, DOC format or RTF format along with TXT and JPG formats using the SAVE AS command.
I have tried a few large PDFs with formatted text and images saved it to HTM format (HTML 4.01 with CSS 1.0) and it gave me an almost exact replica of the PDF. Using Sigil, I could make corrections to the HTM file and create an epub file.
|
You must have been extraordinarily fortunate, or don't mind expending a LOT of time doing clean-up in HTML. I wouldn't use Acrobat Pro's export to ANYTHING feature for anything. The HTML it outputs is filthy. The Word files are just as bad. We have the entire suite of Acrobat programs--everything from InDesign to Acrobat Pro, etc., and nothing in Acrobat exports to html, Word, etc., worth a damn, in my fairly experienced opinion.
Quote:
The PDF to Text convert utilities are useless as they loose the images and page formatiing. The best option would be to convert the PDF to HTML format which retains the formatting and the images. Try to look for some free PDF to HTML utilities on Google and experiment.
|
Again, if someone is very experienced with regex, this can work, but a TON of cleanup is required.
Quote:
One word of caution while trying out these free utilities. They come bundled with unnecessary programs. Select custom install and read the instructions carefully screen after screen while installing these utilities and opt out of any other extra programs the installer tries to put on your system by un-clicking the check-marks. Don't keep pressing the next button repeatedly.
Good Luck!
|
I have yet to see any "PDF-->Word" or "PDF-->Anything" converters on the web, whether tools or websites, that work better than AbbyyFineReader. We do this for a living, and if there were ANYTHING out there that captured text and everything else better than Abbyy, regardless of price, we'd use it. The fact that the OP doesn't think that Abbyy does a good enough job tells me that either a) they expect some type of perfect export from the PDF, which is, literally, impossible (as the image layer and the text layer are absolutely, positively, ALWAYS different), or b) hasn't worked with Abbyy very much.
For anyone who thinks that even cutting & pasting works, take a nice big page in PDF--a high-quality, good PDF. Make sure you get some nice question marks, quotation marks, etc., in the selection. Then paste that, NOT into Word, but into Word's "SEARCH FOR" box--and look at what you get. That's what's really being pasted, or exported in the "Save as Word" or "Save as RTF" file options. It's garbage. Can it be cleaned up, with a lot of time by hand and eye? Yes. But it's not "exact," by ANY means.
Abbyy, in my experience, is still the best solution, and the worse the PDF's get, the better a solution it is.
(OP: you may
safely rely on anything Texanns tells you about scanning, OCR and clean-up; he's
a steely-eyed ePUB pilot. Ditto anything Tox tells you about his tools--they are
excellent.)
Just my $.02. Take it for what it's worth--but we've done well over a thousand PDF-->ePUB & MOBI conversions.
Hitch