|
You mentioned you're still polishing your regex skills (as we all are), so I thought I'd simplify the search and try to break down what all the elements of the plain text version of the search are doing.
(\p{L}(|[,;-])|,”|[MD][rs]\.|Mrs\.|(”|—)(?=\n\p{Ll}))\n
It will match any thing that:
(
\p{L}(|[,;-])| — is a letter followed by nothing, a comma, a semi-colon, or a hyphen (FYI, I don't include colons in this search because I find that more often than not, a colon used in a fiction is supposed to be followed by a paragraph break, but if you want to add it in, you can, of course), or
,”| — is a comma followed by a curly closing quote, or
[MD][rs]\.| — is any version of "Mr., Dr., or Ms.", or
Mrs\.| — is "Mrs.", or
(”|—)(?=\n\p{Ll}) — is a closing curly quote or an em dash [provided it is followed by a line break AND a lowercase letter],
)
and
\n — is followed by a line break.
For the first instance of "\p{L}", you should be able to replace it with "\p{Han}" and Chinese equivalents of punctuation marks that shouldn't denote a paragraph break (e.g. the Chinese comma). But it won't work for the second-to-last bit of regex—i.e. (”|—)(?=\n\p{Ll})— because there's no such thing as a lowercase character, and that's a critical limiter in the functioning of that search element.
Last edited by ElMiko; 05-31-2026 at 09:21 PM.
|