View Single Post
Old 08-19-2010, 04:05 AM   #7
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,463
Karma: 10684861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Quote:
Originally Posted by bear4hunter View Post
I would appreciate, if someone can tell me the RegEx I can use with Notepad++ to find those paragraphs that do not end with .?!...
I have been using TextPad for many, many years, and I still use it when I need to demonstrate Regular Expressions to casual users. I do not want to scare them away with Vim ;-)
I have downloaded Notepad++. Regular expressions are practically undocumented and they behave in a really weird way. It doesn't, for example, recognize \n as an end of line. Notepad++ can only do very limited range of operations on bookmarked lines.

I suggest, download TextPad for this operation. (or find out why Notepad++ does not recognize \n as an "end of line" metacharacter)

Open document.
go to menu Search -> replace..
To find all lines ending with a literal dot, you write search expression [.]$
If you look for all the lines ending with literal "?" search for [?]$
[] is "set" and it selects one character, out of all characters listed inside, so [abc] would find either a, b or c. And previous two searches would be written as [.?]$.
If you look for characters that are at the end of line and are NOT . or ?, you use negation operator ^
so [^.?!]$ would find all lines ending with characters that are NOT .,? or !

Now you want to remember the last character found. You do that by \( and \) as a grouping operator. In the replace string you then refer to expression marked by \( and \) as \1 for the first group, \2 for second, \9 for ninth.
Please note, in various implementations of Regular Expressions you use either \( and \) or plain ( and ) as grouping operators. TextPad can use both, depending on preferences (set up as "use POSIX Regular Expressions).

Let us put that together.
Look for \([^.?!]\)\n
replace with "\1 "
There is space after \1, so the the last word of line and the first word of next line are not run together.

Now you might end with two spaces between words, if there *was* space at the end of the line.
to get rid of this you simply replace two spaces by one space.


In Vim text editor I would simply issue command
:global/[^.?!]$/ join
or, using short versions of commands
:g/[^.?!]$/ j
It means: find all lines not ending with .?! and join them with the next line. Join command inserts the space instead of end of line if there wasn't space at the end of joined line. It would also reduce number of spaces if the next line was intended with spaces.

Vim is difficult to learn, but it is one of THE most powerful text editors, and is also one of THE most completely documented editors. Just check its on-line manual for RE
http://vimdoc.sourceforge.net/htmldoc/usr_toc.html
http://vimdoc.sourceforge.net/htmldo...n.html#pattern
(I am using "one of" diplomatic language, because I do not want to pick fight with our resident Emacs users ;-) )

Last edited by kacir; 08-19-2010 at 04:22 AM.
kacir is offline   Reply With Quote