thanks,
thats definitely helped, I've now only got 57 instances of mid sentence splits. And thanks also for the explanation.
Another related question...
The book was originally a PDF which has been cropped to removed the page numbering, and then converted to epub.
I check the 'remove spaces between paragraphs' and enable heuristics (now with line unwrapping set to 1) but I still end up with a lot of whitespace where I don't want it.
Even if I put something as simple as
<p class="whitespace"> </p>
in my regex
Calibre just will not replace it. It's always still there after the conversion.
I tried using the scenebreak replace to see which whitespace was scenebreak and I then got
379 occurrences of <p class="whitespace"> </p>
and 76 occurrences of <p class="scenebreak">∗ ∗ ∗</p>
I could then remove the 76 scenebreaks by using regex and turning scenebreak detection off.
But I still cannot find a way to get rid of the remaining pesky whitespaces.
I can live with them, the book formatting is much improved, but if there is an explanation, or a way to remove them during calibre conversion, I'd love to hear it.
|