View Single Post
Old 04-03-2008, 07:54 AM   #12
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,557
Karma: 93980341
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by BlackVoid View Post
Is this any good?
http://www.prs-500formatter.com/paydotcom.html

The problems listed here are typical. Although the libprs500 conversion works a bit better than the methods mentioned at the above link, it still has a lot of issues with paragraphs and page breaks.
The root cause of all these issues is the almost complete lack of semantic information in a PDF file. A PDF (most PDF, at least) knows nothing about paragraphs, lines, or even words. All it contains is instructions at the level of "draw a letter 'A' in a 10pt Courier bold font at such-and-such an offset from the corner of the page".

Very often the best thing you can do with a PDF is feed it to an OCR application and let it try to make sense of it.
HarryT is offline   Reply With Quote