View Single Post
Old 07-20-2011, 05:47 PM   #85
Ortep
Fanatic
Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.
 
Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
Hi, I'm not sure if it is helpfull to you, but when I use for example Word to find 'extra' paragraph breaks I look for a ^p without the following characters in front of it

Quote:
. ? ! "
This is because the end of a paragraph is always at the end of a sentence. And these are the characters you will find at the end of a sentence. Well at least in 99.9% of the cases.

Of course you can't find characters that are not there so I turn it around. I look for a ^p with one of those characters in front of it and I change it to that character followed by <<PAR>>. A string you probably won't find in a text. This marks the 'real' paragraphs

I'm not sure if you can do that in a one step regex, but you alway can do it in four seperate ones.


Then I change all the ^p that are left to a space. Those are the ones that aren't at the end of a sentence. In the next step I replace all the <<PAR>> with ^p

This process effectively removes all pargraphs that do not start at the end of a sentence and leaves the ones that are at the end of a sentence.

You probably want to first replace al ^p with a space in front of it with a single ^p because sometimes there is a space between the end of sentence character and the ^p


It is not a perfect proces, but it will catch a least 95% of your problems

Last edited by Ortep; 07-20-2011 at 06:00 PM.
Ortep is offline   Reply With Quote