MobileRead Forums - View Single Post - Detect red underline spelling mistakes with a regex?

lomkiri · Yesterday, 08:38 AM

Quote:

Since this red wave exists, it must be possible to detect it through programming, that is the point of my question

The search/replace works in the code itself, and the waves are shown in the viewer, not in the code, so you have not access to them.

---------------

I propose you this work-around : for each page, pass this regex-function, it will say you if there is some errors in the page and print the number and the list of them (number may be bigger than the number of words in the list, since it may have several errors for a word).

Then you have a base if you want to adapt it (number of occurrences for each error, or scanning the whole book at once and give this list with the name of the file and the list for this page, in a debug box or in a text file, etc.)

--------------

The search is scanning the whole page and replace it with itself, so you'll always get the message "replacement of 1 occurrence", but if there is errors, you'll get the list in a debug box.

The the regex inside the function is scanning each word (skipping the html code) and asking if it is in the dictionary. It lowered all words before asking, if you don't want that, change the value of the variable "lowered" in the function.

If there is no debug-box, it means there is no error detected

The language in your opf file must be right, so the correct dictionary is selected.

Put the cursor at the beginning of the file

Code:

search: <body[^>]*>(.+)</body>

function-regex: 

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    """
    Count the number of errors using dictionary(), for one file.
    Use "replace all", for "current file", with "dot all"
    search string is <body[^>]*>(.+)</body>
    """

    import regex
    lowered = False # True or False
    words = regex.findall(r"(?:<[^>]+>)(*SKIP)(*FAIL)|\b\w+\b", match[0])

    nberr = 0
    errors = set()
    for word in words:
        word = word.lower() if lowered else word
        if not dictionaries.recognized(word):
            nberr += 1
            errors.add(word)
            
    if nberr:
        print(f'"Lower words before check" is {str(lowered)}')
        print(f"{nberr} error(s) in this page, {len(words)} words")
        print("\n", errors)
    
    return match[0]

options of the search-box:

Edit: This function obsolete and superset by its next version, see my next post.