![]() |
regex newbie search end of string char problem
I have a text file with several paragraphs, I'd like to search for paragraphs ending with *[a-zA-Z]</p>, here is an example:
paragraph 1: ..... Code:
.’</p>Code:
.</p>Code:
</p>Code:
([^.]|[^.’])<\/p>$ |
I prefer to do my Joins individually by type. I also only use Replace ALL for these 2 (I have a number of others for special instances that I step thru and Skip false positives)
(The code was snipped from my saved Search file. so things sown are 'escaped'. They also takeinto consideration valid punctuation marks) Code:
74\Name=Cleanup/Joins/Join to upper |
Quote:
|
Quote:
. = any character \. = a period Quote:
From what I can tell, I think you're trying to find paragraphs without a closing punctuation mark. (aka, paragraphs that end in a letter.) Like if you're taking an OCRed book, and trying to combine broken lines together: Code:
<p>This is a copied and</p>Code:
<p>This is a copied and pasted paragraph from the book.</p>Here are the 3 sets of Regex I personally use: Note: DO NOT do a "Replace All". Replace most of these on a case-by-case basis. Also, make sure to save a backup copy of your file. Regex #1 (Hyphens) This searches for a hyphen at the end of a paragraph: Search: -</p>\s+<p> Replace: (LEAVE THIS COMPLETELY BLANK) OR alternate: Search: -</p>\s+<p> Replace: - Example: Code:
<p>This example is where the pre-</p>This searches for everything that's NOT a period, exclamation point, question mark, etc.: Search: ([^>”\?\!\.])</p>\s+<p> Replace: \1 Example: Code:
<p>This is an example</p>In Fiction, colons likely occur within sentences. In Non-Fiction, colons likely occur at the end of paragraphs. Regex #3 (Lowercase Start) This searches for a lowercase letter at the very beginning of the paragraph: Search: <p>[a-z] I make sure to run this after #1 and #2 to catch any strays, then decide these on a case-by-case basis. Example: Code:
<p>The fishy “car dealership”</p> |
Quote:
|
Quote:
Quote:
Search: ([a-z])</p>\s+<p> Replace: \1 <---- Make sure you put a space after. Code:
<p>This is an example</p> |
Quote:
|
| All times are GMT -4. The time now is 10:53 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.