Converting a document from PDF is the WORST case scenario. It will take a lot of elbow grease to fix the document after conversion.
I personally use these two Regexes to help combine broken paragraphs:
Search #1:
Replace #1: (empty)
Search #2:
Code:
([^>”\?\!\.])</p>\s+<p>
Replace #2: (a space is following the 1)
Search #1 will take a line that ends with a hyphen, erase the hyphen, and combine it with the next line (you may/may not want to keep the hyphen, I replace one at a time to make sure the hyphen is not needed).
Search #2 will look for a paragraph NOT ending with any of the characters in red, and will combine it with the next paragraph.
For cleaning up directly from calibre's output you may need to use these Regexes for search instead:
Code:
-</p>\s+<p class="calibre[0-9]+">
Code:
([^>”\?\!\.])</p>\s+<p class="calibre[0-9]+">