Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old Today, 05:56 AM   #1
reinsley
Connoisseur
reinsley began at the beginning.
 
reinsley's Avatar
 
Posts: 67
Karma: 10
Join Date: Dec 2016
Location: France
Device: Kindle PaperWhite
Detect red underline spelling mistakes with a regex?

Hi gents,

Calibre editor and epub format

How can I detect spelling mistakes with a regex? Calibre highlights these errors with a red underline... I would like to detect these visual cues and avoid comparing with a dictionary. My regex skills are not such an up to level. The work has already been done by Calibre. The idea is to jump from contentious points to contentious points with a quick read and make a manual correction.

Thank you for your help.
reinsley is offline   Reply With Quote
Old Today, 06:11 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,559
Karma: 62543878
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by reinsley View Post
Hi gents,

Calibre editor and epub format

How can I detect spelling mistakes with a regex? Calibre highlights these errors with a red underline... I would like to detect these visual cues and avoid comparing with a dictionary. My regex skills are not such an up to level. The work has already been done by Calibre. The idea is to jump from contentious points to contentious points with a quick read and make a manual correction.

Thank you for your help.
What is wrong with the built in Spell Check? It can be set to just show misspelled words (and has options), It suggests and will apply the correction if chosen.
theducks is online now   Reply With Quote
Advert
Old Today, 07:16 AM   #3
reinsley
Connoisseur
reinsley began at the beginning.
 
reinsley's Avatar
 
Posts: 67
Karma: 10
Join Date: Dec 2016
Location: France
Device: Kindle PaperWhite
Quote:
Originally Posted by theducks View Post
What is wrong with the built in Spell Check? It can be set to just show misspelled words (and has options), It suggests and will apply the correction if chosen.
Thank you for your reply.
Spell Check does the job.
Many of the errors are in compound words separated by a hyphen or words broken by a line break. I am looking to reduce the column reading in Spell Check. Quick detection of red waves seems to me to be a way to skip a lot of pages and save time. Since this red wave exists, it must be possible to detect it through programming, that is the point of my question, and visual verification will be very quick by skipping thirty pages without errors. Spell Check will do a fine tuning job on the other errors.
I haven't looked into the options you suggest, but I will do my homework. Best regards.
reinsley is offline   Reply With Quote
Old Today, 07:17 AM   #4
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 181
Karma: 1537710
Join Date: Jul 2021
Device: N/A
... and a double-click on the word in the list brings you to the first occurrence of the culprit, in the text

Edit : Useless reply not covering your needs, I was writing this while you were posting, and I don't know how to delete my msg....

Last edited by lomkiri; Today at 07:21 AM.
lomkiri is offline   Reply With Quote
Old Today, 08:38 AM   #5
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 181
Karma: 1537710
Join Date: Jul 2021
Device: N/A
Quote:
Since this red wave exists, it must be possible to detect it through programming, that is the point of my question
The search/replace works in the code itself, and the waves are shown in the viewer, not in the code, so you have not access to them.

---------------

I propose you this work-around : for each page, pass this regex-function, it will say you if there is some errors in the page and print the number and the list of them (number may be bigger than the number of words in the list, since it may have several errors for a word).

Then you have a base if you want to adapt it (number of occurrences for each error, or scanning the whole book at once and give this list with the name of the file and the list for this page, in a debug box or in a text file, etc.)

--------------

The search is scanning the whole page and replace it with itself, so you'll always get the message "replacement of 1 occurrence", but if there is errors, you'll get the list in a debug box.

The the regex inside the function is scanning each word (skipping the html code) and asking if it is in the dictionary. It lowered all words before asking, if you don't want that, change the value of the variable "lowered" in the function.

If there is no debug-box, it means there is no error detected

The language in your opf file must be right, so the correct dictionary is selected.

Put the cursor at the beginning of the file

Code:
search: <body[^>]*>(.+)</body>

function-regex: 

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    """
    Count the number of errors using dictionary(), for one file.
    Use "replace all", for "current file", with "dot all"
    search string is <body[^>]*>(.+)</body>
    """

    import regex
    lowered = False # True or False
    words = regex.findall(r"(?:<[^>]+>)(*SKIP)(*FAIL)|\b\w+\b", match[0])

    nberr = 0
    errors = set()
    for word in words:
        word = word.lower() if lowered else word
        if not dictionaries.recognized(word):
            nberr += 1
            errors.add(word)
            
    if nberr:
        print(f'"Lower words before check" is {str(lowered)}')
        print(f"{nberr} error(s) in this page, {len(words)} words")
        print("\n", errors)
    
    return match[0]
options of the search-box:


Last edited by lomkiri; Today at 12:37 PM.
lomkiri is offline   Reply With Quote
Advert
Old Today, 11:38 AM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,559
Karma: 62543878
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
And don't forget: F8 jumps to the next red squiggle in the editor
theducks is online now   Reply With Quote
Old Today, 12:05 PM   #7
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 181
Karma: 1537710
Join Date: Jul 2021
Device: N/A
I made the same function, but for all text files at once (this one doesn't work for "current file" only).
Same search string : <body[^>]*>(.+)</body>

Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    """
    Count the number of errors using dictionary(), for one file.
    Use "replace all" for "all text files" with "dot all"
    search string is <body[^>]*>(.+)</body>
    """

    import regex
    lowered = False # True or False
    
    # First passage
    if number == 1:
        replace.call_after_last_match = True
        data["regex"] = regex.compile(r"(?:<[^>]+>)(*SKIP)(*FAIL)|\b\w+\b")
        data["files"]= {}
        data["total_err"] = 0
        
    # Last passage
    if not match:
        if data["total_err"]:
            print(f'"Lower words before check" is {str(lowered)}')
            print(f"{len(data['files'])} files scanned, {data['total_err']} errors in it")
            print("==============================")
            for el in data["files"]:
                res = data["files"][el]
                print("\n", f"file {el}: , {res[0]} error(s), {res[1]} words")
                if res[0]:
                    print(res[2])  
        return 

    # Normal passage
    nberr = 0
    errors = set()
    words = data["regex"].findall(match[0])
    for word in words:
        word = word.lower() if lowered else word
        if not dictionaries.recognized(word):            
            nberr += 1
            errors.add(word)
    data["files"].setdefault(file_name, (nberr, len(words), errors))
    data["total_err"] += nberr
            
    return match[0]

Last edited by lomkiri; Today at 12:12 PM.
lomkiri is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can Calibre correct all spelling mistakes at once? UzmanKasap Editor 4 12-04-2021 09:27 PM
Any way to change the "red wavy underline"? martyger Sigil 8 08-05-2015 12:24 PM
Useless without underline Quetzalcoatlus EPUBReader 0 04-12-2014 02:30 AM
Underline magavi Devices 5 06-21-2013 02:08 PM
another regex puzzle - detect capitalised phrases cybmole Sigil 6 02-24-2012 10:04 AM


All times are GMT -4. The time now is 03:22 PM.


MobileRead.com is a privately owned, operated and funded community.