PDF is a tough format to parse. Hard breaks everywhere, headers and footers are part of the text as well etc. etc. I haven't seen anything good come of messing with it, neither by Calibre conversion, nor by KOReader reflow (yes, I know KOReader is still deep in alpha).
Besides, there are reasons why some things are in PDFs. Some are just pictures, scanned but not OCRed. Other, like screenplays, have very specific formatting that is lost in conversion (I guess it's technically possible to make a screenplay that looks like one in ePub, so probably the hardest part would be parsing the PDF once again).
|