View Single Post
Old 05-31-2026, 02:45 PM   #19
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 570
Karma: 65460
Join Date: Jun 2011
Device: Kindle Voyage, Boox Go 7
You mentioned you're still polishing your regex skills (as we all are), so I thought I'd simplify the search and try to break down what all the elements of the plain text version of the search are doing.

(\p{L}(|[,;-])|,”|[MD][rs]\.|Mrs\.|(”|—)(?=\n\p{Ll}))\n

It will match any thing that:

(
\p{L}(|[,;-])| — is a letter followed by nothing, a comma, a semi-colon, or a hyphen (FYI, I don't include colons in this search because I find that more often than not, a colon used in a fiction is supposed to be followed by a paragraph break, but if you want to add it in, you can, of course), or
,”| — is a comma followed by a curly closing quote, or
[MD][rs]\.| — is any version of "Mr., Dr., or Ms.", or
Mrs\.| — is "Mrs.", or
(”|—)(?=\n\p{Ll}) — is a closing curly quote or an em dash [provided it is followed by a line break AND a lowercase letter],
)

and

\n — is followed by a line break.

For the first instance of "\p{L}", you should be able to replace it with "\p{Han}" and Chinese equivalents of punctuation marks that shouldn't denote a paragraph break (e.g. the Chinese comma). But it won't work for the second-to-last bit of regex—i.e. (”|—)(?=\n\p{Ll})— because there's no such thing as a lowercase character, and that's a critical limiter in the functioning of that search element.

Last edited by ElMiko; 05-31-2026 at 09:21 PM.
ElMiko is offline   Reply With Quote