MobileRead Forums - View Single Post - regex newbie search end of string char problem

Tex2002ans · 10-12-2020, 02:35 PM

Quote:

Originally Posted by michaelbr

I tried this regex

Code:

([^.]|[^.’])<\/p>$

, but it's not working, can someone please tell me what's the best way to search for this string?

The . is a very special symbol in Regex. It stands for "any character". If you want to look for an actual period, you'll want to add a \ before it:

. = any character
\. = a period

Quote:

Originally Posted by michaelbr

I have a text file with several paragraphs, I'd like to search for paragraphs ending with *[a-zA-Z], [...]

Can you try to explain, in words, what's the issue you're trying to solve? And give a few more examples of before/after?

From what I can tell, I think you're trying to find paragraphs without a closing punctuation mark. (aka, paragraphs that end in a letter.)

Like if you're taking an OCRed book, and trying to combine broken lines together:

Code:

<p>This is a copied and</p>
<p>pasted paragraph from the</p>
<p>book.</p>
<p>And true paragraph 2.</p>

After:

Code:

<p>This is a copied and pasted paragraph from the book.</p>
<p>And true paragraph 2.</p>

* * *

Here are the 3 sets of Regex I personally use:

Note: DO NOT do a "Replace All". Replace most of these on a case-by-case basis. Also, make sure to save a backup copy of your file.

Regex #1 (Hyphens)

This searches for a hyphen at the end of a paragraph:

Search: -\s+
Replace: (LEAVE THIS COMPLETELY BLANK)

OR alternate:

Search: -\s+
Replace: -

Example:

Code:

<p>This example is where the pre-</p>
<p>split occurs.</p>

Regex #2 (Not Closing Punctuation)

This searches for everything that's NOT a period, exclamation point, question mark, etc.:

Search: ([^>”\?\!\.])\s+
Replace: \1

Example:

Code:

<p>This is an example</p>
<p>sentence where the person,</p>
<p>places, and things occur.</p>

Note: You can easily add different "valid" punctuation endings as needed. Like a colon may or may not be:

In Fiction, colons likely occur within sentences.
In Non-Fiction, colons likely occur at the end of paragraphs.

Regex #3 (Lowercase Start)

This searches for a lowercase letter at the very beginning of the paragraph:

Search: [a-z]

I make sure to run this after #1 and #2 to catch any strays, then decide these on a case-by-case basis.

Example:

Code:

<p>The fishy “car dealership”</p>
<p>was called Mr. X’s Emporium.</p>