View Single Post
Old 01-19-2022, 04:26 PM   #4
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 170
Karma: 1497966
Join Date: Jul 2021
Device: N/A
- Selecting whole sentences:
You could add space and comma to your search string:
Code:
\b(\p{Lu}[\p{Lu}\s,-]+)\b
(note: \p{Lu} has the same meaning than [[:upper:]], you may use one or the other)
In this case, words like JOHN or FIFA will be targeted and transformed.
If an acronym with dots (F.I.F.A.) is inside the sentence, the selection will stop when reaching it.

- Excluding from the transformation the words not recognized by the dictionary:
Use the search string David gave you:
\b([[:upper:]]{2,})\b
with this regex-function:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):    
    word = match.group(1)
    if dictionaries.recognized(word):
        return word[0] + word[1:].lower()
    return word
This will transform only the recognized words. The last "return" leaves the non-recognized words as they are, it's up to you to do another treatment on them.

- You have another possibility, it's to write into a temp file all the capitalized words not dict-recognized, and decide what you want to do with them (you can do that in a regex-function ; you could store them in a python set, and write the set on the last passage of the function)

If you want a more refined treatment, you'll have to imagine how you can lead with the exceptions and translate that logic into your regex-function

Suggestion: you could also surround the whole capitalized sentence with the tag <small>SENTENCE</small>, it will be much less aggressive, small-caps are often used as an acceptable emphasis. You can do that modifying slightly the regex-function I wrote above.

Last edited by lomkiri; 01-19-2022 at 07:44 PM.
lomkiri is offline   Reply With Quote