View Single Post
Old 07-02-2011, 04:48 PM   #8
greenlees
Junior Member
greenlees began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jun 2011
Device: kindle touch
thanks,

thats definitely helped, I've now only got 57 instances of mid sentence splits. And thanks also for the explanation.

Another related question...

The book was originally a PDF which has been cropped to removed the page numbering, and then converted to epub.

I check the 'remove spaces between paragraphs' and enable heuristics (now with line unwrapping set to 1) but I still end up with a lot of whitespace where I don't want it.

Even if I put something as simple as

<p class="whitespace"> </p>

in my regex

Calibre just will not replace it. It's always still there after the conversion.

I tried using the scenebreak replace to see which whitespace was scenebreak and I then got

379 occurrences of <p class="whitespace"> </p>

and 76 occurrences of <p class="scenebreak">∗ ∗ ∗</p>

I could then remove the 76 scenebreaks by using regex and turning scenebreak detection off.

But I still cannot find a way to get rid of the remaining pesky whitespaces.

I can live with them, the book formatting is much improved, but if there is an explanation, or a way to remove them during calibre conversion, I'd love to hear it.
greenlees is offline   Reply With Quote