You can do that with a regex-function.
Get the regex at
https://regex101.com/r/4fWfX1/1 , it selects every word of 2 letters or more (you can increase this limit). Thanks to EbookMakers, he made the regex.
Get the function below, and "replace all".
You'll have to change the file name, and put another tag if you wish
At the end of the job, there is a "debug" msg with the number of tagged words.
BEWARE:
The search in the function is case sensitive. If you want it insensitive, change the line 40.
If you do that (case insensitive), remember that all words in the list must be lowered.
BEWARE: The file used here is not a CVS one
The file is build in this form : a single word by line (one line = one word to mark)
If you want to use a CVS file, show me the format you used, and I'll adjust the function load_file() for you, but only tomorrow night.
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
# ============= change this if needed ===============
fname = '/data/temp/words.txt' # This is for linux, find by yourself how to give the path in windows
tag_begin = '<mark>' # or <span class="marked"> for exemple
tag_end = '</mark>'
# See also the test at line 40 if you want a search with case insensitive
# =================
def load_list(fname):
data['error'] = False
data['nb_tagged'] = 0
try:
fd = open(fname, 'r')
except FileNotFoundError:
print(f"File {fname} not found, or error in opening it")
# raise # if we raise the error, the msg above will not be printed
data['error'] = True
return []
# put the file in the dict
list_word = [line.strip() for line in fd]
fd.close()
if not list_word:
print(f"File {fname} is empty")
# raise # if we raise the error, the msg above will not be printed
data['error'] = True
return list_word
if not match: # last passage
print(f"Number of words tagged: {data['nb_tagged']}")
return
if number == 1: # first passage
replace.call_after_last_match = True # ask for last passage after all occ.
data['words'] = load_list(fname)
# put this instead if you want insensitive search:
# if not data['error'] and match[1].lower() in data['words']:
if not data['error'] and match[1] in data['words']:
data['nb_tagged'] += 1
return tag_begin + match[1] + tag_end
return match[1]