Quote:
Originally Posted by michaelbr
I tried this regex , but it's not working, can someone please tell me what's the best way to search for this string?
|
The . is a very special symbol in Regex. It stands for "any character". If you want to look for an actual period, you'll want to add a \ before it:
. = any character
\. = a period
Quote:
Originally Posted by michaelbr
I have a text file with several paragraphs, I'd like to search for paragraphs ending with *[a-zA-Z]</p>, [...]
|
Can you try to explain, in words, what's the issue you're trying to solve? And give a few more examples of before/after?
From what I can tell, I
think you're trying to find paragraphs without a closing punctuation mark. (aka, paragraphs that end in a letter.)
Like if you're taking an OCRed book, and trying to combine broken lines together:
Code:
<p>This is a copied and</p>
<p>pasted paragraph from the</p>
<p>book.</p>
<p>And true paragraph 2.</p>
After:
Code:
<p>This is a copied and pasted paragraph from the book.</p>
<p>And true paragraph 2.</p>
* * *
Here are the 3 sets of Regex I personally use:
Note: DO NOT do a "Replace All". Replace most of these on a case-by-case basis. Also, make sure to save a backup copy of your file.
Regex #1 (Hyphens)
This searches for a hyphen at the end of a paragraph:
Search: -</p>\s+<p>
Replace: (LEAVE THIS COMPLETELY BLANK)
OR alternate:
Search: -</p>\s+<p>
Replace: -
Example:
Code:
<p>This example is where the pre-</p>
<p>split occurs.</p>
Regex #2 (Not Closing Punctuation)
This searches for everything that's NOT a period, exclamation point, question mark, etc.:
Search: ([^>”\?\!\.])</p>\s+<p>
Replace: \1
Example:
Code:
<p>This is an example</p>
<p>sentence where the person,</p>
<p>places, and things occur.</p>
Note: You can easily add different "valid" punctuation endings as needed. Like a colon may or may not be:
In Fiction, colons likely occur within sentences.
In Non-Fiction, colons likely occur at the end of paragraphs.
Regex #3 (Lowercase Start)
This searches for a lowercase letter at the very beginning of the paragraph:
Search: <p>[a-z]
I make sure to run this after #1 and #2 to catch any strays, then decide these on a case-by-case basis.
Example:
Code:
<p>The fishy “car dealership”</p>
<p>was called Mr. X’s Emporium.</p>