View Single Post
Old 09-18-2011, 08:07 PM   #1
MacEvansCB
Enthusiast
MacEvansCB began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
Unwanted UnWrapping

I do a lot of conversion from PDF to editable text and there is one thing that drives me up the wall. Anytime there is punctuation (or a number or a capitalized letter) at the right margin, Calibre ALWAYS inserts a hard line break. It doesn't matter what I'm converting to... I've tried EPub, HTMLZ, RTF, TXT and others. The result is always the same.

Now I've gone thru piles of posts on this forum...
I've read the sticky for paragraphs being broke up...
I've gone thru the manual for unwrapping text...
I've turned on Heuristic Processing, enabled only Unwrap Lines and used piles of values between 0.00 and 1.00. While other paragraph breaks come and go, those I'm testing for NEVER stay wrapped as they should. And I hate to waste so much time scrubbing thru documents cleaning up these extra hard breaks.

This has me really really lost. Obviously every PDF reader app I've used, including Acrobat and Apple Preview, knows where these hard line breaks should be and should NOT be. Yet everybody says there is no such thing as paragraphs in a PDF.

Are there secret hidden characters or what???
How the heck does a PDF reader app handle hard and soft breaks correctly???
And why can't the Calibre PDF converter do the same thing????
MacEvansCB is offline   Reply With Quote