![]() |
#1 |
Age improves with wine.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 584
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
How to write regex function which uses dictionary?
I have a book which has hyphens instead of em dashes, and I'm trying to fix it. Using a regex like "-(and|but|with)" catches a few cases, but it would be better to find all "\w+-\w+" which are not in the current dictionary, which would catch about 99% of all cases (and leave the remainder to the proofreading stage).
How could I write a regex function to do this? |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,435
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#3 | |
Age improves with wine.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 584
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Age improves with wine.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 584
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Hmm, having done this I realise that what I really need is a way to *find* things which are not in the dictionary, not all hyphenated words -- since there gazillions of those to wade through, and far fewer which are non-dictionary items.
Seems to be no way to do this. Plugin maybe? Back to the drawing board... |
![]() |
![]() |
![]() |
#5 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 169
Karma: 1497966
Join Date: Jul 2021
Device: N/A
|
Quote:
Code:
def replace_word(wmatch): # if word1-word2 is not is not recognized by the dictionary, replace dash by em-dash with_em_dash = wmatch.group(1) + "—" + wmatch.group(2) if not dictionaries.recognized(wmatch.group()): return with_em_dash return wmatch.group() |
|
![]() |
![]() |
![]() |
#6 | |
Age improves with wine.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 584
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Quote:
What I really want is a way to FIND the ones not in the dictionary and THEN decide whether to replace them. Out of 500 hyphenated words, maybe only 50 will need to be looked at as candidates for replacement, instead of looking at all 500. But I can't see any way to do that, so I'll just have to look at all 500. ![]() |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex in Regex function mode | lindlind | Editor | 5 | 03-22-2024 03:41 AM |
Help with S&R RegEx Function | MerlinMama | Editor | 5 | 05-29-2022 02:23 AM |
Predefined regex for Regex-function | sherman | Editor | 3 | 01-19-2020 05:32 AM |
regex function replacement | The_book | Sigil | 5 | 12-09-2019 09:45 AM |
Regex Function about «» and “” | senhal | Editor | 8 | 04-06-2016 02:12 AM |