View Single Post
Old 11-02-2024, 01:48 PM   #5
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
I realized that if there was a succession of several paragraphs all beginning with a lowercase letter, my regex will capture only one every two, because the pointer will stop after the </p>, so the regex won't target the next paragraph, but will go on and find only the second next one, leaving one unchanged. It would be then necessary to make various passages to target all of them in the sequence (not a big deal, but unesthetic).

This can easily be resolved if we don't capture the last </p>, but use a positive lookahead (for </p>) instead, so the pointer will stop before the </p>, and the regex is ready to capture the next paragraph if it is a candidate.

With this regex, all paragraphs will be targeted during the first passage :
Code:
</p>\s*<p[^>]*>(\p{Ll}.*?)(?=</p>)
or, if we want to target as well paragraphs starting with <space><lowercase>:
Code:
</p>\s*<p[^>]*>(\s?\p{Ll}.*?)(?=</p>)
Replace is still the same: \x20\1
(\x20 is a space)

Last edited by lomkiri; 11-02-2024 at 04:26 PM.
lomkiri is offline   Reply With Quote