Detect red underline spelling mistakes with a regex?

reinsley · 03-01-2026, 04:56 AM

Hi gents,

Calibre editor and epub format

How can I detect spelling mistakes with a regex? Calibre highlights these errors with a red underline... I would like to detect these visual cues and avoid comparing with a dictionary. My regex skills are not such an up to level. The work has already been done by Calibre. The idea is to jump from contentious points to contentious points with a quick read and make a manual correction.

Thank you for your help.

theducks · 03-01-2026, 05:11 AM

Quote:

Originally Posted by reinsley

Hi gents,

Calibre editor and epub format

How can I detect spelling mistakes with a regex? Calibre highlights these errors with a red underline... I would like to detect these visual cues and avoid comparing with a dictionary. My regex skills are not such an up to level. The work has already been done by Calibre. The idea is to jump from contentious points to contentious points with a quick read and make a manual correction.

Thank you for your help.

What is wrong with the built in Spell Check? It can be set to just show misspelled words (and has options), It suggests and will apply the correction if chosen.

reinsley · 03-01-2026, 06:16 AM

Quote:

Originally Posted by theducks

What is wrong with the built in Spell Check? It can be set to just show misspelled words (and has options), It suggests and will apply the correction if chosen.

Thank you for your reply.
Spell Check does the job.
Many of the errors are in compound words separated by a hyphen or words broken by a line break. I am looking to reduce the column reading in Spell Check. Quick detection of red waves seems to me to be a way to skip a lot of pages and save time. Since this red wave exists, it must be possible to detect it through programming, that is the point of my question, and visual verification will be very quick by skipping thirty pages without errors. Spell Check will do a fine tuning job on the other errors.
I haven't looked into the options you suggest, but I will do my homework. Best regards.

lomkiri · 03-01-2026, 06:17 AM

... and a double-click on the word in the list brings you to the first occurrence of the culprit, in the text

Edit : Useless reply not covering your needs, I was writing this while you were posting, and I don't know how to delete my msg....

lomkiri · 03-01-2026, 07:38 AM

Quote:

Since this red wave exists, it must be possible to detect it through programming, that is the point of my question

The search/replace works in the code itself, and the waves are shown in the viewer, not in the code, so you have not access to them.

---------------

I propose you this work-around : for each page, pass this regex-function, it will say you if there is some errors in the page and print the number and the list of them (number may be bigger than the number of words in the list, since it may have several errors for a word).

Then you have a base if you want to adapt it (number of occurrences for each error, or scanning the whole book at once and give this list with the name of the file and the list for this page, in a debug box or in a text file, etc.)

--------------

The search is scanning the whole page and replace it with itself, so you'll always get the message "replacement of 1 occurrence", but if there is errors, you'll get the list in a debug box.

The the regex inside the function is scanning each word (skipping the html code) and asking if it is in the dictionary. It lowered all words before asking, if you don't want that, change the value of the variable "lowered" in the function.

If there is no debug-box, it means there is no error detected

The language in your opf file must be right, so the correct dictionary is selected.

Put the cursor at the beginning of the file

Code:

search: <body[^>]*>(.+)</body>

function-regex: 

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    """
    Count the number of errors using dictionary(), for one file.
    Use "replace all", for "current file", with "dot all"
    search string is <body[^>]*>(.+)</body>
    """

    import regex
    lowered = False # True or False
    words = regex.findall(r"(?:<[^>]+>)(*SKIP)(*FAIL)|\b\w+\b", match[0])

    nberr = 0
    errors = set()
    for word in words:
        word = word.lower() if lowered else word
        if not dictionaries.recognized(word):
            nberr += 1
            errors.add(word)
            
    if nberr:
        print(f'"Lower words before check" is {str(lowered)}')
        print(f"{nberr} error(s) in this page, {len(words)} words")
        print("\n", errors)
    
    return match[0]

options of the search-box:

Edit: This function obsolete and superset by its next version, see my next post.

theducks · 03-01-2026, 10:38 AM

And don't forget: F8 jumps to the next red squiggle in the editor

lomkiri · 03-01-2026, 11:05 AM

I made the same function, but for all text files at once (or any number of files : current file, or selected files, etc.).
Same search string : <body[^>]*>(.+)</body>
Put the cursor at the top of a file, preferably at the top of the first one to be scanned.

Code:

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    """
    Count the number of errors using dictionary(), for one file.
    Use "replace all" with "dot all"
    search string is <body[^>]*>(.+)</body>
    """

    import regex
    lowered = False # True or False
    
    # First passage
    if not data:
        replace.call_after_last_match = True
        data["regex"] = regex.compile(r"(?:<[^>]+>)(*SKIP)(*FAIL)|\b\w+\b")
        data["files"]= {}
        data["total_err"] = 0
        
    # Last passage
    if not match:
        if data["total_err"]:
            print(f'"Lower words before check" is {str(lowered)}')
            print(f"{len(data['files'])} files scanned, {data['total_err']} errors in it")
            print("==============================")
            for el in data["files"]:
                res = data["files"][el]
                print("\n", f"file {el}: , {res[0]} error(s), {res[1]} words")
                if res[0]:
                    print(res[2])  
        return 

    # Normal passage
    nberr = 0
    errors = set()
    words = data["regex"].findall(match[0])
    for word in words:
        word = word.lower() if lowered else word
        if not dictionaries.recognized(word):            
            nberr += 1
            errors.add(word)
    data["files"].setdefault(file_name, (nberr, len(words), errors))
    data["total_err"] += nberr
            
    return match[0]

Edit : Fixed a bug, now it may be applied on any number of files.

lomkiri · 03-02-2026, 06:26 AM

Quote:

Originally Posted by theducks

And don't forget: F8 jumps to the next red squiggle in the editor

I realized just now that this is probably what the OP was looking for.

Anyway, my regex-function gives another view on the error list, and in a book with lots of files and lots of proper names, it may indicates more quickly which files must be inspected. And it gives also the ability to inspect only one or few files instead of the whole epub.

Anyway, it was funny to do it ;-)

An example of the output:

Spoiler:

"Lower words before check" is False
3 files scanned, 1106 errors in it
==============================

file OEBPS/@public@vhost@g@gutenberg@html@files@844@844-h@844-h-2.htm.html: , 204 error(s), 5457 words
{'Lætitia', 'Grosvenor', 'Cardew', 'realised', 'Maxbohm', 'Markby', 'htm', 'tm', 'Melan', 'Migsby', 'Gower', 've', 'unenforceability', 'gutenberg', 'Bayswater', 'nonproprietary', 'Algy', 'eBook', 'Bracknell', 'MERCHANTIBILITY', 'Magley', 'Anabaptists', 'Mobbs', 'savour', 'Worthing', 'Mallam', 'EIN', 'Newby', 'dirs', 'F3', 'didn', 'unlink', 'couldn', 'EBOOK', 'gbnewby', 'www', 'eBooks', 'Leamington', 'pglaf', 'PGLAF', 'Dumbleton', 'Algernon', 'Gwendolen', 'Moncrieff', 'http', 'basinette'}

file OEBPS/@public@vhost@g@gutenberg@html@files@844@844-h@844-h-0.htm.html: , 474 error(s), 9544 words
{'wouldn', 'doesn', 'Bunbury', 'Grosvenor', 'Cardew', 'programme', 'Belgrave', 'Lætitia', 'theatre', 'natured', 'demeanour', 'Peile', 'shouldn', 've', 'Merriman', 'Hertfordshire', 'civilised', 'gutenberg', 'Algy', 'Shoreman', 'Bloxham', 'eBook', 'Bunburying', 'Bracknell', 'Didn', 'recognised', 'isn', 'Worthing', 'patronising', 'mustn', 'weren', 'Harbury', 'Canninge', 'didn', 'slightingly', 'hadn', 'shallying', 'realise', 'couldn', 'wasn', 'EBOOK', 'ccx074', 'neighbours', 'fibres', 'Mudie', 'Niel', 'Methuen', 'Fairfax', 'www', 'grey', 'pglaf', 'Couldn', 'Leclercq', 'Hallo', 'favour', 'debonnair', 'Vanbrugh', 'Marechal', 'Bunburyed', 'Tunbridge', 'Bunburyists', 'Aynesworth', 'colour', 'shilly', 'Farquhar', 'Woolton', 'Bunburyist', 'Algernon', 'Gwendolen', 'Moncrieff', 'THEATRE', 'Dyall', 'Egeria', 'demoralising'}

file OEBPS/@public@vhost@g@gutenberg@html@files@844@844-h@844-h-1.htm.html: , 428 error(s), 9260 words
{'Isn', 'wouldn', 'doesn', 'Bunbury', 'lorgnettte', 'Cardew', 'defence', 'scepticism', 'Belgrave', 'Messrs', 'Markby', 'neologistic', 'shouldn', 've', 'Merriman', 'Hertfordshire', 'Gervase', 'Algy', 'practised', 'aren', 'Bunburying', 'Fifeshire', 'Bracknell', 'Jouet', 'horticulturally', 'isn', 'Worthing', 'amongst', 'favourable', 'didn', 'usen', 'Dorking', 'hadn', 'realise', 'couldn', 'shan', 'pretence', 'Fairfax', 'candour', 'neighbourhood', 'honour', 'Couldn', 'draughts', 'favour', 'colour', 'marvellous', 'womanthrope', 'Bunburyist', 'Algernon', 'Gwendolen', 'Moncrieff'}

(I could have sorted the list on alphabetic order; well, it's easy to add it)

theducks · 03-02-2026, 10:21 AM

Quote:

Originally Posted by lomkiri

I realized just now that this is probably what the OP was looking for.

Anyway, my regex-function gives another view on the error list, and in a book with lots of files and lots of proper names, it may indicates more quickly which files must be inspected. And it gives also the ability to inspect only one or few files instead of the whole epub.

Anyway, it was funny to do it ;-)

An example of the output:

Spoiler:

"Lower words before check" is False
3 files scanned, 1106 errors in it
==============================

file OEBPS/@public@vhost@g@gutenberg@html@files@844@844-h@844-h-2.htm.html: , 204 error(s), 5457 words
{'Lætitia', 'Grosvenor', 'Cardew', 'realised', 'Maxbohm', 'Markby', 'htm', 'tm', 'Melan', 'Migsby', 'Gower', 've', 'unenforceability', 'gutenberg', 'Bayswater', 'nonproprietary', 'Algy', 'eBook', 'Bracknell', 'MERCHANTIBILITY', 'Magley', 'Anabaptists', 'Mobbs', 'savour', 'Worthing', 'Mallam', 'EIN', 'Newby', 'dirs', 'F3', 'didn', 'unlink', 'couldn', 'EBOOK', 'gbnewby', 'www', 'eBooks', 'Leamington', 'pglaf', 'PGLAF', 'Dumbleton', 'Algernon', 'Gwendolen', 'Moncrieff', 'http', 'basinette'}

file OEBPS/@public@vhost@g@gutenberg@html@files@844@844-h@844-h-0.htm.html: , 474 error(s), 9544 words
{'wouldn', 'doesn', 'Bunbury', 'Grosvenor', 'Cardew', 'programme', 'Belgrave', 'Lætitia', 'theatre', 'natured', 'demeanour', 'Peile', 'shouldn', 've', 'Merriman', 'Hertfordshire', 'civilised', 'gutenberg', 'Algy', 'Shoreman', 'Bloxham', 'eBook', 'Bunburying', 'Bracknell', 'Didn', 'recognised', 'isn', 'Worthing', 'patronising', 'mustn', 'weren', 'Harbury', 'Canninge', 'didn', 'slightingly', 'hadn', 'shallying', 'realise', 'couldn', 'wasn', 'EBOOK', 'ccx074', 'neighbours', 'fibres', 'Mudie', 'Niel', 'Methuen', 'Fairfax', 'www', 'grey', 'pglaf', 'Couldn', 'Leclercq', 'Hallo', 'favour', 'debonnair', 'Vanbrugh', 'Marechal', 'Bunburyed', 'Tunbridge', 'Bunburyists', 'Aynesworth', 'colour', 'shilly', 'Farquhar', 'Woolton', 'Bunburyist', 'Algernon', 'Gwendolen', 'Moncrieff', 'THEATRE', 'Dyall', 'Egeria', 'demoralising'}

file OEBPS/@public@vhost@g@gutenberg@html@files@844@844-h@844-h-1.htm.html: , 428 error(s), 9260 words
{'Isn', 'wouldn', 'doesn', 'Bunbury', 'lorgnettte', 'Cardew', 'defence', 'scepticism', 'Belgrave', 'Messrs', 'Markby', 'neologistic', 'shouldn', 've', 'Merriman', 'Hertfordshire', 'Gervase', 'Algy', 'practised', 'aren', 'Bunburying', 'Fifeshire', 'Bracknell', 'Jouet', 'horticulturally', 'isn', 'Worthing', 'amongst', 'favourable', 'didn', 'usen', 'Dorking', 'hadn', 'realise', 'couldn', 'shan', 'pretence', 'Fairfax', 'candour', 'neighbourhood', 'honour', 'Couldn', 'draughts', 'favour', 'colour', 'marvellous', 'womanthrope', 'Bunburyist', 'Algernon', 'Gwendolen', 'Moncrieff'}

(I could have sorted the list on alphabetic order; well, it's easy to add it)

I would leave the list in the order FOUND. AKA just like it is.

lomkiri · 03-02-2026, 04:08 PM

Quote:

I would leave the list in the order FOUND. AKA just like it is.

I'm working on it ;-)

In fact, the very same function may be used for a more useful purpose (if anyone needs it), it can give the list of the files where appear the words defined in a list of words.

For example, replacing the line
if not dictionaries.recognized(word):
with the line :
if word in ['alpha', 'bravo', 'charlie', 'delta']:
will display the files where at least one of this words appears.

reinsley · 03-03-2026, 10:45 AM

Quote:

Originally Posted by lomkiri

I realized just now that this is probably what the OP was looking for.

The OP is in awe of the diamonds on offer. The regex function will be tested and studied with relish. The F8 jump trick will be used to the max. Thank you both, lomkiri and theducks, for your help. It's a pleasure to read your posts. BR.

reinsley · 03-04-2026, 06:11 AM

Hello,
A little feedback.
The regex function saves time. F8 jumps from red wave to red wave. A word composed of two correct words that exist in the dictionary is more difficult to detect in the Spell-Checker's vertical list. Visual checking is faster and doesn't miss anything.
This regex function does exactly what I dreamed of.
Hats off, folks.

lomkiri · 03-04-2026, 06:04 PM

Thank you for your feedback,
I'm glad it fits your needs, I wasn't sure.

Quote:

Originally Posted by reinsley

This regex function does exactly what I dreamed of.

In this case, if you know a little of python, you can easily improve it.

For example, if you want to exclude all capitalized words from the output :
if not dictionaries.recognized(word) and not word.title():

Or, if you want to exclude some words that are very common in the text (e.g. some family names), you define a tuple or a list at the top of the function (or input it from an external file):
exclude = ("Bob", "Alice", "John", "fromage")
and then:
if not dictionaries.recognized(word) and word not in exclude:

Etc.

reinsley · 03-05-2026, 05:38 AM

Quote:

Originally Posted by lomkiri

Thank you for your feedback,
I'm glad it fits your needs, I wasn't sure.

In this case, if you know a little of python, you can easily improve it.

Feedback is the least I can do. You helped me get started with regex functions with a solution and tutorial in mid-February 2024. I'm continuing on my little journey that is slowly leading me towards Python. It's a pleasure to see the Calibre editor find a personal criterion. It gives me a pleasant feeling of vertigo. BR to you and theducks too.

03-01-2026, 04:56 AM	#1
reinsley Connoisseur Posts: 70 Karma: 10 Join Date: Dec 2016 Location: France Device: Kindle PaperWhite	Detect red underline spelling mistakes with a regex? Hi gents, Calibre editor and epub format How can I detect spelling mistakes with a regex? Calibre highlights these errors with a red underline... I would like to detect these visual cues and avoid comparing with a dictionary. My regex skills are not such an up to level. The work has already been done by Calibre. The idea is to jump from contentious points to contentious points with a quick read and make a manual correction. Thank you for your help.

03-01-2026, 06:17 AM	#4
lomkiri Groupie Posts: 191 Karma: 1537710 Join Date: Jul 2021 Device: N/A	... and a double-click on the word in the list brings you to the first occurrence of the culprit, in the text Edit : Useless reply not covering your needs, I was writing this while you were posting, and I don't know how to delete my msg.... Last edited by lomkiri; 03-01-2026 at 06:21 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Can Calibre correct all spelling mistakes at once?	UzmanKasap	Editor	4	12-04-2021 08:27 PM
Any way to change the "red wavy underline"?	martyger	Sigil	8	08-05-2015 11:24 AM
Useless without underline	Quetzalcoatlus	EPUBReader	0	04-12-2014 01:30 AM
Underline	magavi	Devices	5	06-21-2013 01:08 PM
another regex puzzle - detect capitalised phrases	cybmole	Sigil	6	02-24-2012 09:04 AM

03-01-2026, 10:38 AM	#6
theducks Well trained by Cats Posts: 31,680 Karma: 64144480 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	And don't forget: F8 jumps to the next red squiggle in the editor

03-04-2026, 06:11 AM	#12
reinsley Connoisseur Posts: 70 Karma: 10 Join Date: Dec 2016 Location: France Device: Kindle PaperWhite	Hello, A little feedback. The regex function saves time. F8 jumps from red wave to red wave. A word composed of two correct words that exist in the dictionary is more difficult to detect in the Spell-Checker's vertical list. Visual checking is faster and doesn't miss anything. This regex function does exactly what I dreamed of. Hats off, folks.

Advert

Advert