Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 09-27-2021, 03:12 PM   #1
Ted Friesen
Nameless Being
 
Question Move metadata between fields (Regex SR)

Is it possible to extract a number, say #139 using (\#)(\d+) from a title field and use the second group (the digits) as a series number. And how would I specify the series name?

This is for a magazine series with the volume number embedded in the title and I'd rather use that as a series number rather than part of the title.

Regex isn't responding as I thought it should. I tried (.*?)(\#)(\d+)(.*?) from "title" field, thinking I could replace \3 to the "series" field. But that doesn't remove anything from the title and makes a new series with groups 3 and 4. So for a bulk SR I'd end up with a separate series for each issue.
  Reply With Quote
Old 09-27-2021, 10:23 PM   #2
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
That is working as expected. Specifying "\3" like that just says "use the third group in the match". It isn't removing the group from the original. And, you can only update one field at a time. You will need to do it in three steps.

First, set the series name. How you do that will depend on where the series name is. If you are updating books in the same series, you can use the bulk metadata editor to set it. This will also set a series index according to the options you choose at that time. If the series is somewhere in the metadata, you will need to do a search-and-replace to set the series name.

Then, set the series index. Use your search above with the "Destination field" set to "series_index". The "Replace with" will be "\3".

Then remove the series index from the title. Your search above with the "Destination field" set to "title" should work. The replace looks like it should be "\1" or "\4" or something like "\1 - \4". It depends on exactly how the titles are currently structured.
davidfor is offline   Reply With Quote
Old 09-28-2021, 12:52 PM   #3
Ted Friesen
Nameless Being
 
Thanks for your reply David,

Here's what I tried...

I highlighted all the magazines and added a series WWJ. All show up now with that series as number 1 in the series.

Search field title (.*?)(\#)(\d+)(.*?)
Destination field series_index: \3
Test text: Woodworker's Journal #142 Jul-Aug 2000
Test result: 142 Jul-Aug 2000

With the error message: could not convert string to float: 142Jul-Aug 2000

It would seem that the search group (\d+) is somehow being expanded to include the following group (.*?)

Last edited by Ted Friesen; 09-28-2021 at 01:21 PM. Reason: Added results of SR
  Reply With Quote
Old 09-28-2021, 01:59 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,804
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Try with the space
Code:
(.*?)(\#)(\d+)\s(.*?)
theducks is offline   Reply With Quote
Old 09-28-2021, 04:15 PM   #5
Ted Friesen
Nameless Being
 
Thanks Ducks,

That got me further, but not all the way.

With this search string (.*?)(\#)(\d+)\s(.*?)_ (a blank space at the end)
Test text of: Woodworker's Journal #142 Jul-Aug 2000 results in 1422000
The addition of \s has removed the text "Jul-Aug " from the result, but not the following number. Without the trailing space that text string is not removed.

Edit:
But adding \d+ after the trailing space resulted in just the volume number. Someone needs to explain why \s(.*?) \d+ would capture Jul-Aug 2000 and (.*?) would not

Last edited by Ted Friesen; 09-28-2021 at 04:23 PM. Reason: Success after posting
  Reply With Quote
Old 09-28-2021, 04:39 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,804
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Jul-Aug is not a valid month (for conversion to Date type)
If yo want spaces in the result, you need to provide them
Code:
\3 \4
BTW is ther a reason to capture the #? \#(\d+) would make the pattern
theducks is offline   Reply With Quote
Old 09-28-2021, 04:55 PM   #7
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Ted Friesen View Post
Edit:
But adding \d+ after the trailing space resulted in just the volume number. Someone needs to explain why \s(.*?) \d+ would capture Jul-Aug 2000 and (.*?) would not
"needs to explain" is a bit strong...

Your expression needs to match the entire string. Anything not matched is left behind and included in the result. To that end you should understand the difference between greedy and non-greedy operators, and the semantics of anchors. The expression "(.*)" matches as much as possible to succeed, including nothing, leaving behind any unmatched text. The expression "(.*?)" matches as little as possible to succeed, including nothing, leaving behind any unmatched text. Adding the "\d+" forces the match to find at least one number.

If you add anchors then you can be sure that you match the entire line. Without spending a lot of time looking at the specifics, it seem that the anchored expression
Code:
^(.*?)(\#)(\d+)\s(.*?)$
does what you want, as does
Code:
^(.*?)(\#)(\d+)\s(.*)$
chaley is offline   Reply With Quote
Old 09-28-2021, 08:09 PM   #8
Ted Friesen
Nameless Being
 
Quote:
Originally Posted by theducks View Post
Jul-Aug is not a valid month (for conversion to Date type)
If yo want spaces in the result, you need to provide them
Code:
\3 \4
BTW is ther a reason to capture the #? \#(\d+) would make the pattern
No, no need to capture the #, now that you ask. I'm new to regex is all.

Last edited by Ted Friesen; 10-01-2021 at 07:30 PM. Reason: Correct spelling
  Reply With Quote
Old 09-28-2021, 08:20 PM   #9
Ted Friesen
Nameless Being
 
Quote:
Originally Posted by chaley View Post
"needs to explain" is a bit strong...

Your expression needs to match the entire string. Anything not matched is left behind and included in the result. To that end you should understand the difference between greedy and non-greedy operators, and the semantics of anchors. The expression "(.*)" matches as much as possible to succeed, including nothing, leaving behind any unmatched text. The expression "(.*?)" matches as little as possible to succeed, including nothing, leaving behind any unmatched text. Adding the "\d+" forces the match to find at least one number.

If you add anchors then you can be sure that you match the entire line. Without spending a lot of time looking at the specifics, it seem that the anchored expression
Code:
^(.*?)(\#)(\d+)\s(.*?)$
does what you want, as does
Code:
^(.*?)(\#)(\d+)\s(.*)$
Chaley,

Let me rephrase. Please strike "someone needs to explain" and substitute "please help me understand" as you have graciously done. Thanks. I'll work with your suggestion.
  Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre's Basic metadata with custom metadata fields MichaelSarri Calibre 2 04-15-2020 03:59 AM
Move date from beginning to end of title w/ regex? melba.d Library Management 2 08-04-2018 01:58 AM
Metadata - which fields do you use? latepaul Audiobook Discussions 2 07-09-2016 04:22 PM
regex/search help needed to remove redundant metadata in different fields Sidetrack Library Management 4 03-27-2016 10:07 AM
numbers in metadata fields philmac8841 Library Management 3 09-07-2013 03:29 AM


All times are GMT -4. The time now is 10:43 AM.


MobileRead.com is a privately owned, operated and funded community.