![]() |
#1 |
Age improves with wine.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 596
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
How to write regex function which uses dictionary?
I have a book which has hyphens instead of em dashes, and I'm trying to fix it. Using a regex like "-(and|but|with)" catches a few cases, but it would be better to find all "\w+-\w+" which are not in the current dictionary, which would catch about 99% of all cases (and leave the remainder to the proofreading stage).
How could I write a regex function to do this? |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,497
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#3 | |
Age improves with wine.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 596
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Age improves with wine.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 596
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Hmm, having done this I realise that what I really need is a way to *find* things which are not in the dictionary, not all hyphenated words -- since there gazillions of those to wade through, and far fewer which are non-dictionary items.
Seems to be no way to do this. Plugin maybe? Back to the drawing board... |
![]() |
![]() |
![]() |
#5 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 172
Karma: 1497966
Join Date: Jul 2021
Device: N/A
|
Quote:
Code:
def replace_word(wmatch): # if word1-word2 is not is not recognized by the dictionary, replace dash by em-dash with_em_dash = wmatch.group(1) + "—" + wmatch.group(2) if not dictionaries.recognized(wmatch.group()): return with_em_dash return wmatch.group() |
|
![]() |
![]() |
![]() |
#6 | |
Age improves with wine.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 596
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Quote:
What I really want is a way to FIND the ones not in the dictionary and THEN decide whether to replace them. Out of 500 hyphenated words, maybe only 50 will need to be looked at as candidates for replacement, instead of looking at all 500. But I can't see any way to do that, so I'll just have to look at all 500. ![]() |
|
![]() |
![]() |
![]() |
#7 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 172
Karma: 1497966
Join Date: Jul 2021
Device: N/A
|
Quote:
Then you open this file in a text-editor and delete from it all but the occurrences you want to correct Then you modify the regex-function to load this new file in a list, and to check each occurrence against this list : if it's present in the list, correct it in the text Note: It could be more convenient to delete from the file only the occurrences you want to correct (deleting only more or less 50 instead of 450), in that case you'll have to do the contrary: correct only the ones which are NOT in the list. Last edited by lomkiri; 08-26-2025 at 01:53 PM. |
|
![]() |
![]() |
![]() |
#8 | |
Age improves with wine.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 596
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Quote:
Of course, for the book that made me think about this issue, I have already gone through the 500 instances by hand and corrected the 50 that were wrong... but I'm sure it'll happen to me again! |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex in Regex function mode | lindlind | Editor | 5 | 03-22-2024 03:41 AM |
Help with S&R RegEx Function | MerlinMama | Editor | 5 | 05-29-2022 02:23 AM |
Predefined regex for Regex-function | sherman | Editor | 3 | 01-19-2020 05:32 AM |
regex function replacement | The_book | Sigil | 5 | 12-09-2019 09:45 AM |
Regex Function about «» and “” | senhal | Editor | 8 | 04-06-2016 02:12 AM |