View Single Post
Old 09-14-2009, 11:45 AM   #4
Sparrow
Wizard
Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.
 
Posts: 4,395
Karma: 1358132
Join Date: Nov 2007
Location: UK
Device: Palm TX, CyBook Gen3
Quote:
Originally Posted by ahi View Post
The most straightforward way to detect whether or not paragraphs are line-broken is to simply count what percentage of non-empty lines begin with a character that is not an opening quote, a dash/en-dash/em-dash, an opening parenthesis, or a capital letter.
Like Neko, I look for the character that precedes the line break. In MSWord the line break is generally the ^p character.
I replace all space+^p occurrences with ^p; repeating until all such spaces are removed.

Then for any letter/number, or non-full stop punctuation (except quotes) I replace the following ^p with a space.

Hyphens get replaced individually, since some may need to be retained.

This is quick and dirty - it will retain full-stop+^p when they should be full-stop+space - but the process is normally just prep for proof-reading. (Or I can just opt to live with those inaccuracies.)
Also, verses need to be edited manually.
Sparrow is offline   Reply With Quote