I crop with Adobe Acrobat, then export as html and proceed to regex the piss out of it. I've also had decent luck with PDFMasher, but then you need to add images and a lot of formatting back in (and it's still pretty experimental). I think it's always going to be a fairly hands on affair to convert PDF's.
|