Quote:
Originally Posted by ahi
The most straightforward way to detect whether or not paragraphs are line-broken is to simply count what percentage of non-empty lines begin with a character that is not an opening quote, a dash/en-dash/em-dash, an opening parenthesis, or a capital letter.
|
Like Neko, I look for the character that precedes the line break. In MSWord the line break is generally the ^p character.
I replace all space+^p occurrences with ^p; repeating until all such spaces are removed.
Then for any letter/number, or non-full stop punctuation (except quotes) I replace the following ^p with a space.
Hyphens get replaced individually, since some may need to be retained.
This is quick and dirty - it will retain full-stop+^p when they should be full-stop+space - but the process is normally just prep for proof-reading. (Or I can just opt to live with those inaccuracies.)
Also, verses need to be edited manually.