![]() |
#1 |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
regex newbie search end of string char problem
I have a text file with several paragraphs, I'd like to search for paragraphs ending with *[a-zA-Z]</p>, here is an example:
paragraph 1: ..... Code:
.’</p> Code:
.</p> Code:
</p> Code:
([^.]|[^.’])<\/p>$ |
![]() |
![]() |
![]() |
#2 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,876
Karma: 59840450
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
I prefer to do my Joins individually by type. I also only use Replace ALL for these 2 (I have a number of others for special instances that I step thru and Skip false positives)
(The code was snipped from my saved Search file. so things sown are 'escaped'. They also takeinto consideration valid punctuation marks) Code:
74\Name=Cleanup/Joins/Join to upper 74\Find="([[:alpha:],][\"\x201d\xe2\x80\x9d]*)</p>\\s*<p\\b[^>]*>([A-Z\xe2\x80\x9c\"])" 74\Replace=\\1 \\2 75\Name=Cleanup/Joins/To Lower 75\Find="\\s*([a-z],*)</p>\\s+<p class=\"calibre1\">([a-z])" 75\Replace=\\1 \\2 |
![]() |
![]() |
![]() |
#3 | |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
Quote:
|
|
![]() |
![]() |
![]() |
#4 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
. = any character \. = a period Quote:
From what I can tell, I think you're trying to find paragraphs without a closing punctuation mark. (aka, paragraphs that end in a letter.) Like if you're taking an OCRed book, and trying to combine broken lines together: Code:
<p>This is a copied and</p> <p>pasted paragraph from the</p> <p>book.</p> <p>And true paragraph 2.</p> Code:
<p>This is a copied and pasted paragraph from the book.</p> <p>And true paragraph 2.</p> Here are the 3 sets of Regex I personally use: Note: DO NOT do a "Replace All". Replace most of these on a case-by-case basis. Also, make sure to save a backup copy of your file. Regex #1 (Hyphens) This searches for a hyphen at the end of a paragraph: Search: -</p>\s+<p> Replace: (LEAVE THIS COMPLETELY BLANK) OR alternate: Search: -</p>\s+<p> Replace: - Example: Code:
<p>This example is where the pre-</p> <p>split occurs.</p> This searches for everything that's NOT a period, exclamation point, question mark, etc.: Search: ([^>”\?\!\.])</p>\s+<p> Replace: \1 Example: Code:
<p>This is an example</p> <p>sentence where the person,</p> <p>places, and things occur.</p> In Fiction, colons likely occur within sentences. In Non-Fiction, colons likely occur at the end of paragraphs. Regex #3 (Lowercase Start) This searches for a lowercase letter at the very beginning of the paragraph: Search: <p>[a-z] I make sure to run this after #1 and #2 to catch any strays, then decide these on a case-by-case basis. Example: Code:
<p>The fishy “car dealership”</p> <p>was called Mr. X’s Emporium.</p> Last edited by Tex2002ans; 10-12-2020 at 02:42 PM. |
||
![]() |
![]() |
![]() |
#5 | |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
Quote:
|
|
![]() |
![]() |
![]() |
#6 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
Search: ([a-z])</p>\s+<p> Replace: \1 <---- Make sure you put a space after. Code:
<p>This is an example</p> <p>sentence. But THIS LINE</p> <p>won't match.</p> |
||
![]() |
![]() |
![]() |
#7 | |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
Quote:
|
|
![]() |
![]() |
![]() |
Tags |
regex, search criteria |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex in search problems (NOT Search&Replace; the search bar) | lairdb | Calibre | 3 | 03-15-2017 07:10 PM |
Search regex problem | ColMac | Editor | 23 | 04-17-2015 03:58 PM |
Regex Problem / Line that does't end with .</p> | mcam77 | Sigil | 6 | 03-25-2013 06:38 PM |
Regex - replace only part of a string - how? | flameproof | Sigil | 11 | 02-23-2012 04:43 AM |
My RegEx isn't doing what I hoped to remove page numbers and a fixed string | winterminute | Calibre | 6 | 12-19-2010 10:55 PM |