View Single Post
Old 12-22-2021, 05:57 PM   #9
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 169
Karma: 1497966
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by phossler View Post
I have 3 cases and your function works great on case 1 and case 2. Doesn't seem to do anything for case 3. Can you tweak it a little for me please?
OK, let's modfify the 3rd case from <space>[a-z] to <space>, and add a collector for all anything else (comma, for example, or semicolon). In both case, it removes the period

Code:
((?:\p{Lu}\.)+)(?:(</(?:p|div|b/|blockquote)>)|( \p{Lu})|(' ')|(.))
(note : the regex above is wrong and was corrected in msg #13 toward the one below:)
((?:\p{Lu}\.){2,})(?:\s*(<(?:/p|/div|br/|/blockquote)>)|( \p{Lu})|(' ')|(.))
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    acro = match.group(1).replace('.', '')
    if  end := match.group(2):	# </p> or <br/> etc.
        period = '.'
    elif end := match.group(3):	# <space>[A-Z]
        period = '.'
    elif end := match.group(4):	# <space>
        period = ''
    elif end := match.group(5):	# anything else
        period = ''
  
    return acro + period + end
I've made it to be easily red and modified, but it may be shortened this way, it does exactly the same thing :
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    acro = match.group(1).replace('.', '')
    if  end := (match.group(2) or match.group(3)):
        period = '.'
    elif end := (match.group(4) or match.group(5)):
        period = ''
    return acro + period + end

Last edited by lomkiri; 12-23-2021 at 08:19 PM. Reason: correction of the regex
lomkiri is offline   Reply With Quote