Quote:
Originally Posted by BlackVoid
Is this any good?
http://www.prs-500formatter.com/paydotcom.html
The problems listed here are typical. Although the libprs500 conversion works a bit better than the methods mentioned at the above link, it still has a lot of issues with paragraphs and page breaks.
|
The root cause of all these issues is the almost complete lack of semantic information in a PDF file. A PDF (most PDF, at least) knows nothing about paragraphs, lines, or even words. All it contains is instructions at the level of "draw a letter 'A' in a 10pt Courier bold font at such-and-such an offset from the corner of the page".
Very often the best thing you can do with a PDF is feed it to an OCR application and let it try to make sense of it.