View Single Post
Old 01-02-2012, 07:48 AM   #18
frostschutz
Linux User
frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.
 
frostschutz's Avatar
 
Posts: 2,282
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
I had a similar issue with PDF->Epub conversion. The PDF was very nicely formatted with real spacing between paragraphs instead of just indentation. Calibre however turned it into one paragraph per line of the PDF, regardless which paragraph / spacing removal options you used.

I ended up converting it to properly formatted HTML myself using pdftohtml (same what Calibre uses), sed and tidy. The result was 99% perfect because I found a reliable indicator for paragraphs in the raw HTML code produced by pdftohtml. So I could just sed/replace those indicators with </p><p> to get the proper paragraphs.

Different PDF needs a different rule there though; every PDF is built differently so it's hard to come up with a routine that works for all. If Calibre doesn't work right out of the box you just might have to do it manually.
frostschutz is offline   Reply With Quote