Oh, dear. Right.
Ok, I extracted a sample epub, and it seems it's decided that each line is a paragraph. So, a sample of the raw text is:
Code:
<p class="MsoPlainText">He stared at the warm blackness, half closing his eye, then opening it again, </p>
<p class="MsoPlainText">wide. Over on his left, in front, was a narrow smear of murky light in the air, </p>
<p class="MsoPlainText">which at first he could make no sense of. The light danced, a flickering glow. </p>
<p class="MsoPlainText">Then gradually he began to sort out details of the room.</p>
<p class="MsoPlainText">Or half room. It was big, high ceilinged. There was no furniture, but the floor </p>
<p class="MsoPlainText">was carpeted. Across the room, from wall to wall, hung some kind of thick </p>
<p class="MsoPlainText">curtain. Two curtains, actually, pulled together. Hence that chink of light in </p>
<p class="MsoPlainText">the center where the inner folds of the two draperies didn't quite meet.</p>
This makes things much more difficult. Still, removing all the paragraphs except where the </p> occurs immediately following a " or . would do the job - not perfect, there'd still be a few paragraph breaks where there shouldn't, but at least conversation would be split up nicely and there wouldn't be paragraph breaks mid-sentence.
Any advice?