09-27-2021, 03:12 PM | #1 |
Nameless Being
|
Move metadata between fields (Regex SR)
Is it possible to extract a number, say #139 using (\#)(\d+) from a title field and use the second group (the digits) as a series number. And how would I specify the series name?
This is for a magazine series with the volume number embedded in the title and I'd rather use that as a series number rather than part of the title. Regex isn't responding as I thought it should. I tried (.*?)(\#)(\d+)(.*?) from "title" field, thinking I could replace \3 to the "series" field. But that doesn't remove anything from the title and makes a new series with groups 3 and 4. So for a bulk SR I'd end up with a separate series for each issue. |
09-27-2021, 10:23 PM | #2 |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
That is working as expected. Specifying "\3" like that just says "use the third group in the match". It isn't removing the group from the original. And, you can only update one field at a time. You will need to do it in three steps.
First, set the series name. How you do that will depend on where the series name is. If you are updating books in the same series, you can use the bulk metadata editor to set it. This will also set a series index according to the options you choose at that time. If the series is somewhere in the metadata, you will need to do a search-and-replace to set the series name. Then, set the series index. Use your search above with the "Destination field" set to "series_index". The "Replace with" will be "\3". Then remove the series index from the title. Your search above with the "Destination field" set to "title" should work. The replace looks like it should be "\1" or "\4" or something like "\1 - \4". It depends on exactly how the titles are currently structured. |
09-28-2021, 12:52 PM | #3 |
Nameless Being
|
Thanks for your reply David,
Here's what I tried... I highlighted all the magazines and added a series WWJ. All show up now with that series as number 1 in the series. Search field title (.*?)(\#)(\d+)(.*?) Destination field series_index: \3 Test text: Woodworker's Journal #142 Jul-Aug 2000 Test result: 142 Jul-Aug 2000 With the error message: could not convert string to float: 142Jul-Aug 2000 It would seem that the search group (\d+) is somehow being expanded to include the following group (.*?) Last edited by Ted Friesen; 09-28-2021 at 01:21 PM. Reason: Added results of SR |
09-28-2021, 01:59 PM | #4 |
Well trained by Cats
Posts: 29,804
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Try with the space
Code:
(.*?)(\#)(\d+)\s(.*?)
|
09-28-2021, 04:15 PM | #5 |
Nameless Being
|
Thanks Ducks,
That got me further, but not all the way. With this search string (.*?)(\#)(\d+)\s(.*?)_ (a blank space at the end) Test text of: Woodworker's Journal #142 Jul-Aug 2000 results in 1422000 The addition of \s has removed the text "Jul-Aug " from the result, but not the following number. Without the trailing space that text string is not removed. Edit: But adding \d+ after the trailing space resulted in just the volume number. Someone needs to explain why \s(.*?) \d+ would capture Jul-Aug 2000 and (.*?) would not Last edited by Ted Friesen; 09-28-2021 at 04:23 PM. Reason: Success after posting |
09-28-2021, 04:39 PM | #6 |
Well trained by Cats
Posts: 29,804
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Jul-Aug is not a valid month (for conversion to Date type)
If yo want spaces in the result, you need to provide them Code:
\3 \4 |
09-28-2021, 04:55 PM | #7 | |
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Your expression needs to match the entire string. Anything not matched is left behind and included in the result. To that end you should understand the difference between greedy and non-greedy operators, and the semantics of anchors. The expression "(.*)" matches as much as possible to succeed, including nothing, leaving behind any unmatched text. The expression "(.*?)" matches as little as possible to succeed, including nothing, leaving behind any unmatched text. Adding the "\d+" forces the match to find at least one number. If you add anchors then you can be sure that you match the entire line. Without spending a lot of time looking at the specifics, it seem that the anchored expression Code:
^(.*?)(\#)(\d+)\s(.*?)$ Code:
^(.*?)(\#)(\d+)\s(.*)$ |
|
09-28-2021, 08:09 PM | #8 |
Nameless Being
|
No, no need to capture the #, now that you ask. I'm new to regex is all.
Last edited by Ted Friesen; 10-01-2021 at 07:30 PM. Reason: Correct spelling |
09-28-2021, 08:20 PM | #9 | |
Nameless Being
|
Quote:
Let me rephrase. Please strike "someone needs to explain" and substitute "please help me understand" as you have graciously done. Thanks. I'll work with your suggestion. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre's Basic metadata with custom metadata fields | MichaelSarri | Calibre | 2 | 04-15-2020 03:59 AM |
Move date from beginning to end of title w/ regex? | melba.d | Library Management | 2 | 08-04-2018 01:58 AM |
Metadata - which fields do you use? | latepaul | Audiobook Discussions | 2 | 07-09-2016 04:22 PM |
regex/search help needed to remove redundant metadata in different fields | Sidetrack | Library Management | 4 | 03-27-2016 10:07 AM |
numbers in metadata fields | philmac8841 | Library Management | 3 | 09-07-2013 03:29 AM |