MobileRead Forums - View Single Post - How to highlight the words I'm learning in an epub file?

lomkiri · 07-28-2022, 04:32 PM

You can do that with a regex-function.

Get the regex at https://regex101.com/r/4fWfX1/1 , it selects every word of 2 letters or more (you can increase this limit). Thanks to EbookMakers, he made the regex.

Get the function below, and "replace all".
You'll have to change the file name, and put another tag if you wish
At the end of the job, there is a "debug" msg with the number of tagged words.

BEWARE:
The search in the function is case sensitive. If you want it insensitive, change the line 40.
If you do that (case insensitive), remember that all words in the list must be lowered.

BEWARE: The file used here is not a CVS one
The file is build in this form : a single word by line (one line = one word to mark)
If you want to use a CVS file, show me the format you used, and I'll adjust the function load_file() for you, but only tomorrow night.

Code:

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    
    # ============= change this if needed ===============
    fname = '/data/temp/words.txt'    # This is for linux, find by yourself how to give the path in windows
    tag_begin = '<mark>'    # or <span class="marked"> for exemple
    tag_end = '</mark>'
    # See also the test at line 40 if you want a search with case insensitive
    # =================
    
    def load_list(fname):
        data['error'] = False
        data['nb_tagged'] = 0
        try:
            fd = open(fname, 'r')
        except FileNotFoundError:
            print(f"File {fname} not found, or error in opening it")
            # raise		# if we raise the error, the msg above will not be printed
            data['error'] = True
            return []

        # put the file in the dict
        list_word = [line.strip() for line in fd]
        fd.close()
        if not list_word:
            print(f"File {fname} is empty")
            # raise		# if we raise the error, the msg above will not be printed
            data['error'] = True
        return list_word
    
    if not match:	# last passage
        print(f"Number of words tagged: {data['nb_tagged']}")
        return

    if number == 1:	# first passage
        replace.call_after_last_match = True	# ask for last passage after all occ.
        data['words'] = load_list(fname)

    # put this instead if you want insensitive search:
    # if not data['error'] and match[1].lower() in data['words']:
    if not data['error'] and match[1] in data['words']:
        
        data['nb_tagged'] += 1
        return tag_begin + match[1] + tag_end
    return match[1]