Alternatively, you can play with Calibre's line unwrapping factor until you find a value that works for your specific input. I've found 0.50 works with many PDFs I've converted, but not all. As of right now, trial and error is really your only option.
Alternatively alternatively, turn on Calibre's debugging mode when you do the conversion. This will save all of the intermediate conversions in the folder you choose (PDF to raw HTML with <br /> line breaks, raw HTML to cleaned up HTML after attempting to unwrap lines and replace <br />s with proper <p />s, etc). You can then clean up the HTML directly and reconvert starting from HTML rather than PDF. Also useful for playing with header/footer regex generation if the default isn't working on your input.
|