View Single Post
Old 08-19-2010, 03:34 PM   #3
Wintersdark
Junior Member
Wintersdark began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Nov 2009
Device: iPhone 3G
Post

Oh, dear. Right.

Ok, I extracted a sample epub, and it seems it's decided that each line is a paragraph. So, a sample of the raw text is:

Code:
      <p class="MsoPlainText">He stared at the warm blackness, half closing his eye, then opening it again, </p>
      <p class="MsoPlainText">wide. Over on his left, in front, was a narrow smear of murky light in the air, </p>
      <p class="MsoPlainText">which at first he could make no sense of. The light danced, a flickering glow. </p>
      <p class="MsoPlainText">Then gradually he began to sort out details of the room.</p>
      <p class="MsoPlainText">Or half room. It was big, high ceilinged. There was no furniture, but the floor </p>
      <p class="MsoPlainText">was carpeted. Across the room, from wall to wall, hung some kind of thick </p>
      <p class="MsoPlainText">curtain. Two curtains, actually, pulled together. Hence that chink of light in </p>
      <p class="MsoPlainText">the center where the inner folds of the two draperies didn't quite meet.</p>
This makes things much more difficult. Still, removing all the paragraphs except where the </p> occurs immediately following a " or . would do the job - not perfect, there'd still be a few paragraph breaks where there shouldn't, but at least conversation would be split up nicely and there wouldn't be paragraph breaks mid-sentence.

Any advice?
Wintersdark is offline   Reply With Quote