Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 08-14-2025, 04:00 AM   #1
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 584
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
How to write regex function which uses dictionary?

I have a book which has hyphens instead of em dashes, and I'm trying to fix it. Using a regex like "-(and|but|with)" catches a few cases, but it would be better to find all "\w+-\w+" which are not in the current dictionary, which would catch about 99% of all cases (and leave the remainder to the proofreading stage).

How could I write a regex function to do this?
Phssthpok is offline   Reply With Quote
Old 08-14-2025, 04:19 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,435
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://manual.calibre-ebook.com/fun...phenated-words
kovidgoyal is offline   Reply With Quote
Old 08-15-2025, 02:21 AM   #3
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 584
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
Quote:
Originally Posted by kovidgoyal View Post
Aha! Thank you!
Phssthpok is offline   Reply With Quote
Old 08-17-2025, 10:08 AM   #4
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 584
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
Quote:
Originally Posted by Phssthpok View Post
Aha! Thank you!
Hmm, having done this I realise that what I really need is a way to *find* things which are not in the dictionary, not all hyphenated words -- since there gazillions of those to wade through, and far fewer which are non-dictionary items.

Seems to be no way to do this. Plugin maybe? Back to the drawing board...
Phssthpok is offline   Reply With Quote
Old 08-18-2025, 04:45 PM   #5
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 169
Karma: 1497966
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by Phssthpok View Post
Hmm, having done this I realise that what I really need is a way to *find* things which are not in the dictionary, not all hyphenated words.
Try to replace the sub-function replace_word() that is inside the code of the example by this one, that does a replace when the compound word is NOT is the dict:
Code:
    def replace_word(wmatch):
        # if word1-word2 is not is not recognized by the dictionary, replace dash by em-dash
        with_em_dash = wmatch.group(1) + "—" + wmatch.group(2)
        if not dictionaries.recognized(wmatch.group()):
            return with_em_dash
        return wmatch.group()
lomkiri is offline   Reply With Quote
Old Yesterday, 05:26 AM   #6
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 584
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
Quote:
Originally Posted by lomkiri View Post
Try to replace the sub-function replace_word() that is inside the code of the example by this one, that does a replace when the compound word is NOT is the dict
No, I did this -- the problem with this code is that I find every occurrence of a hyphenated word and then whether to replace it or not.

What I really want is a way to FIND the ones not in the dictionary and THEN decide whether to replace them. Out of 500 hyphenated words, maybe only 50 will need to be looked at as candidates for replacement, instead of looking at all 500. But I can't see any way to do that, so I'll just have to look at all 500.
Phssthpok is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex in Regex function mode lindlind Editor 5 03-22-2024 03:41 AM
Help with S&R RegEx Function MerlinMama Editor 5 05-29-2022 02:23 AM
Predefined regex for Regex-function sherman Editor 3 01-19-2020 05:32 AM
regex function replacement The_book Sigil 5 12-09-2019 09:45 AM
Regex Function about «» and “” senhal Editor 8 04-06-2016 02:12 AM


All times are GMT -4. The time now is 07:39 PM.


MobileRead.com is a privately owned, operated and funded community.