View Single Post
Old 09-08-2010, 09:28 AM   #1
RichieTheK
Enthusiast
RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.RichieTheK ought to be getting tired of karma fortunes by now.
 
Posts: 36
Karma: 480532
Join Date: Mar 2010
Location: Chapel Hill, North Carolina, USA
Device: Nexus 7 (2012), Samsung Galaxy Pro 8.4
PDF to EPUB - spurious paragraph breaks

When converting from PDF to EPUB, I have noticed that callibre will insert a paragraph break whenever a line in the PDF document ends with a dash or apostrophe/single-quote that is flush with the right margin of the document. This behavior is repeatable.

In the case of the dash, I cannot think of a case where one would end a paragraph. The cases that I see are where dashes are used to indicate parenthetical remarks, and appear in the middle of a sentence. If, by chance, a dash should wind up flush against the right margin, calibre inserts a paragraph break.

The case of the apostrophe/single-quote is more difficult because there are cases where a single-quote can end a paragaph. However, I have seen calibre insert a paragraph break where it is not appropriate. A paragraph break should not be generated if the single-quote/apostrophe is preceded by a comma or lower-case character, or if the first character on the following line is a lower-case character.

Again, let me stress that this only occurs if the dash/single-quote/apostrophe is flush with the right margin of the PDF document.

I don't know if the PDF structure-detection can be fine-tuned to detect these cases, but if someone is willing to try, I have a single-page PDF document, extracted from a larger book, that shows both cases.
RichieTheK is offline   Reply With Quote