Quote:
Originally Posted by phossler
I have 3 cases and your function works great on case 1 and case 2. Doesn't seem to do anything for case 3. Can you tweak it a little for me please?
|
OK, let's modfify the 3rd case from <space>[a-z] to <space>, and add a collector for all anything else (comma, for example, or semicolon). In both case, it removes the period
Code:
((?:\p{Lu}\.)+)(?:(</(?:p|div|b/|blockquote)>)|( \p{Lu})|(' ')|(.))
(note : the regex above is wrong and was corrected in msg #13 toward the one below:)
((?:\p{Lu}\.){2,})(?:\s*(<(?:/p|/div|br/|/blockquote)>)|( \p{Lu})|(' ')|(.))
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
acro = match.group(1).replace('.', '')
if end := match.group(2): # </p> or <br/> etc.
period = '.'
elif end := match.group(3): # <space>[A-Z]
period = '.'
elif end := match.group(4): # <space>
period = ''
elif end := match.group(5): # anything else
period = ''
return acro + period + end
I've made it to be easily red and modified, but it may be shortened this way, it does exactly the same thing :
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
acro = match.group(1).replace('.', '')
if end := (match.group(2) or match.group(3)):
period = '.'
elif end := (match.group(4) or match.group(5)):
period = ''
return acro + period + end