Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 03-14-2024, 11:27 PM   #16
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
I trie
Quote:
Originally Posted by moldy View Post
ERROR: No replace function: You must create a Python function named replace in your code
It's because it's pure python, not a regex-function.
From your messages, I understood that you knew a little about python, so it was made to be executed in the python command line or in the calibre-debug prompt, by a python program (in a file) or called in interactive mode.

Create your json-file with the dict, then create a file extract.py, put this in it :
Code:
def main():
    import json
    fname = '/data/temp/beastones.json'   # adapt this to your needs
    equiv = json.load(open(fname))
    if not equiv:
        print(f'Problem loading {fname}')
        return
    print( '|'.join(equiv.keys()))

main()
after adapting the path and name of the json file in the code

Then go to the command line, and type
[if you are on linux:] python3 your/path/extract.py
[if you are on windows:] calibre-debug your\path\extract.py

The function print() will display on command line, with your example:
John|Paul|George|Ringo
Then you may copy-paste it in the find "field" of the search that will make the substitution.

More help on calibre-debug with the option --help

Last edited by lomkiri; 03-15-2024 at 10:52 AM.
lomkiri is offline   Reply With Quote
Old 03-18-2024, 09:37 AM   #17
moldy
Enthusiast
moldy began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
Quote:
Originally Posted by lomkiri View Post
I trie

It's because it's pure python, not a regex-function.
From your messages, I understood that you knew a little about python, so it was made to be executed in the python command line or in the calibre-debug prompt, by a python program (in a file) or called in interactive mode.
I only know a little Python and I couldn't get the code to run in the interpreter. Then I got a little er... confused.

Anyway; I discovered what was wrong (syntax error in the json) and managed to get the dict method to work perfectly.

However when experimenting with my actual working file I found the massive size of the data in the find field somewhat unwieldy to say the least.

In the end I decided upon a simpler solution that also works as expected.

Find field:
Code:
>[^<>]+<
Function:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    return match.group().replace('John','Mick').replace('George','Keith').replace('Paul','Ronnie').replace('Ringo','Charlie')
My working data file is in 2 columns of text so using Notepad++ in column mode I can easily add all the other punctuation and then remove the superfluous spaces. Its also easy to add/remove/change data then copy and paste into the function.

Many thanks for your input Lomkiri. Your time wasn't totally wasted as I learned a lot from your suggestions.
moldy is offline   Reply With Quote
Advert
Old 03-19-2024, 12:57 PM   #18
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by moldy View Post
However when experimenting with my actual working file I found the massive size of the data in the find field somewhat unwieldy to say the least.
You were who asked to put all the searched words in the find field :-).
With your new search string, you could use your json file in this way, avoiding the need of hundred of ptyhon replaces :
Code:
# your code : 
    # return match.group().replace('John','Mick').replace('George','Keith').replace('Paul','Ronnie').replace('Ringo','Charlie')

# Alternative code :
    # insert here the code to load the json file into the dict "equiv"
    m = match.group(0) 
    for key in equiv:
        m = m.replace(key, equiv[key])
    return m
With this code, the function is generic, you need to modify only the json file for another set of searched words.

If you're sure that none of the searched words is inside a tag (as "body", "span", or a class name, for example), you could even search the whole html page, much quicker :
find : <body[^>]*>\K(.+)</body> (with "dot all" checked)

Last edited by lomkiri; 03-19-2024 at 05:19 PM.
lomkiri is offline   Reply With Quote
Old 03-20-2024, 12:15 PM   #19
moldy
Enthusiast
moldy began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
Quote:
In the end I decided upon a simpler solution that also works as expected.
Actually it didn’t work as I wanted. Using the example of John George etc. there were matches for not only John but also Johnson, Johnjo LongJohn and so on.
To counteract this I tried wrapping John in \b anchors in the function - no matches at all. After researching online I tried escaping the backslash \\b - no matches. After more reading I tried escaping the escape characters \\\\b - no matches. After even more research I tried the raw data solution r”\bJohn” - no matches.

I would like to go back to the dict method again (as described in lomkiri’s suggestion above). However I think I will probably have the same issue there when the pairs are passed to the function from the json file.

Is there another way around this?
moldy is offline   Reply With Quote
Old 03-20-2024, 04:33 PM   #20
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by moldy View Post
To counteract this I tried wrapping John in \b anchors in the function
It should have worked (in a regex, but not with the python str.replace())
Quote:
I would like to go back to the dict method again (as described in lomkiri’s suggestion above).
Try this :
Code:
    # insert here the code to load the json file into the dict "equiv"
    # (see my post #12 for this code)
    import regex
    m = match.group() 
    for key in equiv:
        m = regex.sub(rf'\b{key}\b', equiv[key], m)
    return m
It works, I have tested it :
Johnson, Johnjo LongJohn and so on John and Ringo, and also john ==>
Johnson, Johnjo LongJohn and so on Mick and Charlie, and also john

Note: rf'\b{key}\b' is the same as r'\b{}\b'.format(key) and will be expanded to '\bJohn\b' if key == 'John'

It works with either <body[^>]*>\K(.+)</body> (with "dot all" checked) or >\K([^>]+)(?![^<>{}]*[>}]) (but the 1st form will be quicker, treating one whole html file at each iteration, with the condition, as I said above, that none of your keys will match something inside an html tag). The 2nd form will select the text between tags and avoid the part inside the tag.

Last edited by lomkiri; 03-21-2024 at 08:03 AM.
lomkiri is offline   Reply With Quote
Advert
Old 03-21-2024, 11:09 AM   #21
moldy
Enthusiast
moldy began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
I can't get this to work.The function considers that it has made just 1 replacement but actually it hasn't. Please view image at:



My find code is:
Code:
<body[^>]*>\K(.+)</body>
And my function is:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    
    from calibre.utils.config import JSONConfig
    m = match[0]


    if number == 1:
        fname = 'beatstones.json'
        data['equiv'] = JSONConfig(fname)
        if not data['equiv']:
            print(f'Problem loading {fname}, no treatment will be done')
    
    return data['equiv'].get(m, m)

            
    import regex
    m = match.group() 
    for key in equiv:
        m = regex.sub(rf'\b{key}\b', equiv[key], m)
    return m
There are no errors reported. As far as I can see there are no problems from the json file and I can extract the keys from it. It must be a problem with the function but, with my limited knowledge, I can't find it.
moldy is offline   Reply With Quote
Old 03-21-2024, 11:10 AM   #22
moldy
Enthusiast
moldy began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
https://imgur.com/a/21b0fFD
moldy is offline   Reply With Quote
Old 03-21-2024, 12:19 PM   #23
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
It's because you should have adapted the code.

1) The line "return data['equiv'].get(m, m)" is from the old code, it was not to be included.
2) In this code, the dict is loaded in data['equiv'], not in equiv, so you'll have to adapt the new code to this fact (the reason I've loaded it in data is that, doing this, it's necessary to load the json only once for all passages)
3) Since you're loading one whole page, it's normal that there is only one change. The regex system counts the times it takes an expression (a page, in this case). It will count a change even if there is no change in the page (it has no way to know if the "m" you return has been modified).
If you've got 5 pages, it will give you 5 changes, even if you have 100 changes in the 1st page, and none in the other 4 pages.
Click in "See modifications" to see the real changes.
4) If you want to know how many changes have been made, you'll have to use subn(), not sub(), and must increment a counter in data['counts'], and "print" this counter during the last passage (ask if you need it and you don't know how to do that).


The code (tested) is :
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    from calibre.utils.config import JSONConfig
    import regex

    # Load json only at first passage
    # data will retain its values throught all passages when "replace all"
    if number == 1:
        fname = 'beastones.json'
        data['equiv'] = JSONConfig(fname)
        if not data['equiv']:
            print(f'Problem loading {fname}, no treatment will be done')
            
    # normal passage
    m = match.group() 
    for key, val in data['equiv'].items():
        m = regex.sub(rf'\b{key}\b', val, m)
    return m
The json file (beastones.json, in this case, change fname if you choose another filename) must be in the config folder of calibre, and must contain :
Code:
{
  "John": "Mike",
  "Paul": "Keith",
  "George": "Ronnie",
  "Ringo": "Charlie"
}

Last edited by lomkiri; 03-22-2024 at 07:16 PM. Reason: screenshot removed
lomkiri is offline   Reply With Quote
Old 03-22-2024, 11:15 AM   #24
moldy
Enthusiast
moldy began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
Thank you lomkiri; the function above works perfectly using my large data file. Both find statements work equally well for my purposes.

Thanks also for your perseverance and patience.
moldy is offline   Reply With Quote
Old 03-23-2024, 06:31 AM   #25
moldy
Enthusiast
moldy began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
Looking at the function further @lomkiri.

Using Find: >\K([^>]+)(?![^<>{}]*[>}])

Because of the look-ahead I would have expected any text inside <> or {} to be ignored as part of the match. However using the example:

<p>John <George> {Paul} Ringo</p>

George is not matched but Paul is. Have I mis-understood how the look-ahead works?
moldy is offline   Reply With Quote
Old 03-23-2024, 08:22 AM   #26
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
[}>]\K([^>}]+)(?![^<>{}]*[>}])

(the curved brackets are here to avoid inline styles, if there are no such parts you can get rid of them)

Last edited by lomkiri; 03-23-2024 at 09:26 AM.
lomkiri is offline   Reply With Quote
Old 03-23-2024, 09:46 AM   #27
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
Kindly proposed by EbookMakers, who is master es-regexes :-)

Excluding all that is inside <>
>\K([^<>]+)(?=<)

Excluding all that is inside <> and {}
[>}]\K([^<>{}]+)(?=[<{])

Last edited by lomkiri; 03-23-2024 at 10:45 AM.
lomkiri is offline   Reply With Quote
Old 03-24-2024, 03:47 PM   #28
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
I have posted in the pinned thread Saved Search/Regex Functions an enhanced version of this function.

The regex inside the function avoids the content of the html tags, so we are free now to scan the whole page, even if some class names are in the list.
It doesn't avoid anymore the text inside {} since the inline styles are not selected by the main regex (of the "find" field).

I have written also a longer version with counters (total of all changes, and (in a json file) counters by word)
lomkiri is offline   Reply With Quote
Old 03-26-2024, 10:52 AM   #29
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by moldy View Post
Thank you @lomkiri. The function, including the counters, works perfectly on my data file of 187 entries (and growing ever larger).
You're very welcome. Glad it fits your needs.

A friend asked me what would be the practical use of this function, and I must say I was unable to answer :-) (out of a stalinist revision of ebooks about history ;p)
Out of curiosity, how do you use it? I mean: what is the situation where you need to translate a list by another?
lomkiri is offline   Reply With Quote
Old 03-26-2024, 01:35 PM   #30
moldy
Enthusiast
moldy began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Oct 2015
Device: Kindle
I have sent a pm.
moldy is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search and Replace Ashjuk Sigil 10 02-25-2021 11:17 AM
Regex in search problems (NOT Search&Replace; the search bar) lairdb Calibre 3 03-15-2017 07:10 PM
save multiple search/replace, or search/replace multiple ebooks user743 Editor 12 04-12-2014 02:38 AM
Search and Replace Help Squidly21 Conversion 2 01-08-2014 12:19 AM
search and replace - drops blanks in replace ? cybmole Conversion 10 03-13-2011 03:07 AM


All times are GMT -4. The time now is 07:52 AM.


MobileRead.com is a privately owned, operated and funded community.