MobileRead Forums - View Single Post

lomkiri · 12-22-2021, 06:57 PM

Quote:

Originally Posted by phossler

I have 3 cases and your function works great on case 1 and case 2. Doesn't seem to do anything for case 3. Can you tweak it a little for me please?

OK, let's modfify the 3rd case from <space>[a-z] to <space>, and add a collector for all anything else (comma, for example, or semicolon). In both case, it removes the period

Code:

((?:\p{Lu}\.)+)(?:(</(?:p|div|b/|blockquote)>)|( \p{Lu})|(' ')|(.))
(note : the regex above is wrong and was corrected in msg #13 toward the one below:)
((?:\p{Lu}\.){2,})(?:\s*(<(?:/p|/div|br/|/blockquote)>)|( \p{Lu})|(' ')|(.))

Code:

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    acro = match.group(1).replace('.', '')
    if  end := match.group(2):	# </p> or <br/> etc.
        period = '.'
    elif end := match.group(3):	# <space>[A-Z]
        period = '.'
    elif end := match.group(4):	# <space>
        period = ''
    elif end := match.group(5):	# anything else
        period = ''
  
    return acro + period + end

I've made it to be easily red and modified, but it may be shortened this way, it does exactly the same thing :

Code:

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    acro = match.group(1).replace('.', '')
    if  end := (match.group(2) or match.group(3)):
        period = '.'
    elif end := (match.group(4) or match.group(5)):
        period = ''
    return acro + period + end