03-14-2024, 11:27 PM | #16 | |
Zealot
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
|
I trie
Quote:
From your messages, I understood that you knew a little about python, so it was made to be executed in the python command line or in the calibre-debug prompt, by a python program (in a file) or called in interactive mode. Create your json-file with the dict, then create a file extract.py, put this in it : Code:
def main(): import json fname = '/data/temp/beastones.json' # adapt this to your needs equiv = json.load(open(fname)) if not equiv: print(f'Problem loading {fname}') return print( '|'.join(equiv.keys())) main() Then go to the command line, and type [if you are on linux:] python3 your/path/extract.py [if you are on windows:] calibre-debug your\path\extract.py The function print() will display on command line, with your example: John|Paul|George|Ringo Then you may copy-paste it in the find "field" of the search that will make the substitution. More help on calibre-debug with the option --help Last edited by lomkiri; 03-15-2024 at 10:52 AM. |
|
03-18-2024, 09:37 AM | #17 | |
Enthusiast
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
|
Quote:
Anyway; I discovered what was wrong (syntax error in the json) and managed to get the dict method to work perfectly. However when experimenting with my actual working file I found the massive size of the data in the find field somewhat unwieldy to say the least. In the end I decided upon a simpler solution that also works as expected. Find field: Code:
>[^<>]+< Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): return match.group().replace('John','Mick').replace('George','Keith').replace('Paul','Ronnie').replace('Ringo','Charlie') Many thanks for your input Lomkiri. Your time wasn't totally wasted as I learned a lot from your suggestions. |
|
Advert | |
|
03-19-2024, 12:57 PM | #18 | |
Zealot
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
|
Quote:
With your new search string, you could use your json file in this way, avoiding the need of hundred of ptyhon replaces : Code:
# your code : # return match.group().replace('John','Mick').replace('George','Keith').replace('Paul','Ronnie').replace('Ringo','Charlie') # Alternative code : # insert here the code to load the json file into the dict "equiv" m = match.group(0) for key in equiv: m = m.replace(key, equiv[key]) return m If you're sure that none of the searched words is inside a tag (as "body", "span", or a class name, for example), you could even search the whole html page, much quicker : find : <body[^>]*>\K(.+)</body> (with "dot all" checked) Last edited by lomkiri; 03-19-2024 at 05:19 PM. |
|
03-20-2024, 12:15 PM | #19 | |
Enthusiast
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
|
Quote:
To counteract this I tried wrapping John in \b anchors in the function - no matches at all. After researching online I tried escaping the backslash \\b - no matches. After more reading I tried escaping the escape characters \\\\b - no matches. After even more research I tried the raw data solution r”\bJohn” - no matches. I would like to go back to the dict method again (as described in lomkiri’s suggestion above). However I think I will probably have the same issue there when the pairs are passed to the function from the json file. Is there another way around this? |
|
03-20-2024, 04:33 PM | #20 | ||
Zealot
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
|
Quote:
Quote:
Code:
# insert here the code to load the json file into the dict "equiv" # (see my post #12 for this code) import regex m = match.group() for key in equiv: m = regex.sub(rf'\b{key}\b', equiv[key], m) return m Johnson, Johnjo LongJohn and so on John and Ringo, and also john ==> Johnson, Johnjo LongJohn and so on Mick and Charlie, and also john Note: rf'\b{key}\b' is the same as r'\b{}\b'.format(key) and will be expanded to '\bJohn\b' if key == 'John' It works with either <body[^>]*>\K(.+)</body> (with "dot all" checked) or >\K([^>]+)(?![^<>{}]*[>}]) (but the 1st form will be quicker, treating one whole html file at each iteration, with the condition, as I said above, that none of your keys will match something inside an html tag). The 2nd form will select the text between tags and avoid the part inside the tag. Last edited by lomkiri; 03-21-2024 at 08:03 AM. |
||
Advert | |
|
03-21-2024, 11:09 AM | #21 |
Enthusiast
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
|
I can't get this to work.The function considers that it has made just 1 replacement but actually it hasn't. Please view image at:
My find code is: Code:
<body[^>]*>\K(.+)</body> Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): from calibre.utils.config import JSONConfig m = match[0] if number == 1: fname = 'beatstones.json' data['equiv'] = JSONConfig(fname) if not data['equiv']: print(f'Problem loading {fname}, no treatment will be done') return data['equiv'].get(m, m) import regex m = match.group() for key in equiv: m = regex.sub(rf'\b{key}\b', equiv[key], m) return m |
03-21-2024, 11:10 AM | #22 |
Enthusiast
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
|
|
03-21-2024, 12:19 PM | #23 |
Zealot
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
|
It's because you should have adapted the code.
1) The line "return data['equiv'].get(m, m)" is from the old code, it was not to be included. 2) In this code, the dict is loaded in data['equiv'], not in equiv, so you'll have to adapt the new code to this fact (the reason I've loaded it in data is that, doing this, it's necessary to load the json only once for all passages) 3) Since you're loading one whole page, it's normal that there is only one change. The regex system counts the times it takes an expression (a page, in this case). It will count a change even if there is no change in the page (it has no way to know if the "m" you return has been modified). If you've got 5 pages, it will give you 5 changes, even if you have 100 changes in the 1st page, and none in the other 4 pages. Click in "See modifications" to see the real changes. 4) If you want to know how many changes have been made, you'll have to use subn(), not sub(), and must increment a counter in data['counts'], and "print" this counter during the last passage (ask if you need it and you don't know how to do that). The code (tested) is : Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): from calibre.utils.config import JSONConfig import regex # Load json only at first passage # data will retain its values throught all passages when "replace all" if number == 1: fname = 'beastones.json' data['equiv'] = JSONConfig(fname) if not data['equiv']: print(f'Problem loading {fname}, no treatment will be done') # normal passage m = match.group() for key, val in data['equiv'].items(): m = regex.sub(rf'\b{key}\b', val, m) return m Code:
{ "John": "Mike", "Paul": "Keith", "George": "Ronnie", "Ringo": "Charlie" } Last edited by lomkiri; 03-22-2024 at 07:16 PM. Reason: screenshot removed |
03-22-2024, 11:15 AM | #24 |
Enthusiast
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
|
Thank you lomkiri; the function above works perfectly using my large data file. Both find statements work equally well for my purposes.
Thanks also for your perseverance and patience. |
03-23-2024, 06:31 AM | #25 |
Enthusiast
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
|
Looking at the function further @lomkiri.
Using Find: >\K([^>]+)(?![^<>{}]*[>}]) Because of the look-ahead I would have expected any text inside <> or {} to be ignored as part of the match. However using the example: <p>John <George> {Paul} Ringo</p> George is not matched but Paul is. Have I mis-understood how the look-ahead works? |
03-23-2024, 08:22 AM | #26 |
Zealot
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
|
[}>]\K([^>}]+)(?![^<>{}]*[>}])
(the curved brackets are here to avoid inline styles, if there are no such parts you can get rid of them) Last edited by lomkiri; 03-23-2024 at 09:26 AM. |
03-23-2024, 09:46 AM | #27 |
Zealot
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
|
Kindly proposed by EbookMakers, who is master es-regexes :-)
Excluding all that is inside <> >\K([^<>]+)(?=<) Excluding all that is inside <> and {} [>}]\K([^<>{}]+)(?=[<{]) Last edited by lomkiri; 03-23-2024 at 10:45 AM. |
03-24-2024, 03:47 PM | #28 |
Zealot
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
|
I have posted in the pinned thread Saved Search/Regex Functions an enhanced version of this function.
The regex inside the function avoids the content of the html tags, so we are free now to scan the whole page, even if some class names are in the list. It doesn't avoid anymore the text inside {} since the inline styles are not selected by the main regex (of the "find" field). I have written also a longer version with counters (total of all changes, and (in a json file) counters by word) |
03-26-2024, 10:52 AM | #29 | |
Zealot
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
|
Quote:
A friend asked me what would be the practical use of this function, and I must say I was unable to answer :-) (out of a stalinist revision of ebooks about history ;p) Out of curiosity, how do you use it? I mean: what is the situation where you need to translate a list by another? |
|
03-26-2024, 01:35 PM | #30 |
Enthusiast
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
|
I have sent a pm.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Search and Replace | Ashjuk | Sigil | 10 | 02-25-2021 11:17 AM |
Regex in search problems (NOT Search&Replace; the search bar) | lairdb | Calibre | 3 | 03-15-2017 07:10 PM |
save multiple search/replace, or search/replace multiple ebooks | user743 | Editor | 12 | 04-12-2014 02:38 AM |
Search and Replace Help | Squidly21 | Conversion | 2 | 01-08-2014 12:19 AM |
search and replace - drops blanks in replace ? | cybmole | Conversion | 10 | 03-13-2011 03:07 AM |