I had a similar issue with PDF->Epub conversion. The PDF was very nicely formatted with real spacing between paragraphs instead of just indentation. Calibre however turned it into one paragraph per line of the PDF, regardless which paragraph / spacing removal options you used.
I ended up converting it to properly formatted HTML myself using pdftohtml (same what Calibre uses), sed and tidy. The result was 99% perfect because I found a reliable indicator for paragraphs in the raw HTML code produced by pdftohtml. So I could just sed/replace those indicators with </p><p> to get the proper paragraphs.
Different PDF needs a different rule there though; every PDF is built differently so it's hard to come up with a routine that works for all. If Calibre doesn't work right out of the box you just might have to do it manually.
|