PDF to EPUB - spurious paragraph breaks
When converting from PDF to EPUB, I have noticed that callibre will insert a paragraph break whenever a line in the PDF document ends with a dash or apostrophe/single-quote that is flush with the right margin of the document. This behavior is repeatable.
In the case of the dash, I cannot think of a case where one would end a paragraph. The cases that I see are where dashes are used to indicate parenthetical remarks, and appear in the middle of a sentence. If, by chance, a dash should wind up flush against the right margin, calibre inserts a paragraph break.
The case of the apostrophe/single-quote is more difficult because there are cases where a single-quote can end a paragaph. However, I have seen calibre insert a paragraph break where it is not appropriate. A paragraph break should not be generated if the single-quote/apostrophe is preceded by a comma or lower-case character, or if the first character on the following line is a lower-case character.
Again, let me stress that this only occurs if the dash/single-quote/apostrophe is flush with the right margin of the PDF document.
I don't know if the PDF structure-detection can be fine-tuned to detect these cases, but if someone is willing to try, I have a single-page PDF document, extracted from a larger book, that shows both cases.
|