|03-16-2013, 05:56 PM||#1|
Join Date: Aug 2011
Regex help pls for bulk editing metadata
I have tried to make sense of the regex help in the manual but can't do it on my own.
I need help with two things.
Both are for bulk editing metadata (title).
Sorry if I mix terms here, I have very little knowledge about this.
Calibre version is 0.9.15.
I can remove parts of a title with the replace-function if the string is always the same.
Castle 03. Castle Kidnapped
Castle 04 - Ever after
I can remove Castle 0
However this would leave different numbers as well as characters (. - space).
3. Castle Kidnapped
4 - Ever after
What is the regular expression to remove varying numbers too and could this be combined in one regex (remove "castle" + any number with leading zero)?
Dots etc. could be removed with search-replace if needed.
Sometimes I want to bulk remove part of a title.
E.g. everything in brackets
(Castle 3) Castle Kidnapped
Castle lost (a castle novella)
How to do this (the text within brackets is not the same)?
What if the string you want to remove is not enclosed by the same character?
Castle 3; Castle Kidnapped
Castle lost - a castle novella
You would have to use an expression like "remove everything left from ;" or "remove everything after -", correct? But how?
Can you use a phrase as "seperator"?
If a title is like this
Castle 3 Kidnapped and loving it
Kidnapped and loving it a castle novella
Is it possible to use "kidnapped and loving it" as string and remove text left or right of it?
Thanks in advance.
Last edited by manawydan; 03-16-2013 at 05:58 PM.
|03-16-2013, 08:56 PM||#2|
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Ubuntu Linux, Cybook Opus, Motorola Xoom with Mantano Premium
Some parts of the RE language that every one that uses it should learn:
Use \d to specify any digit.
Use \s to specify any white space character.
Use . to specify any character.
+ specifies one or more of the pattern before it.
? specifies zero or one of the pattern before it.
* specifies zero or more of the pattern before it.
Use \d+ to specify one or more digits. Use .+ to specify any text consisting of one or more characters. Use .* to specify any text consisting of zero or more characters.
Use parentheses to group matches into larger patterns. These grouped patterns can be retrieved in order using \1, \2 and so on.
Escape special characters with a backspace to match them.
Use | to match either the pattern before or after it.
(Read more for instance here: http://docs.python.org/2/howto/regex.html, there are a lot of RE tutorials, including in the calibre forum here.)
1. So to match the series name "Castle" with series number as you ask for, you could use:
(The string "Castle" followed by a white space, followed by a zero, followed by one or more digits.)
Or to more generally match any series name and number separated from the title with a dash with spaces on both sides of it:
(Any text, including nothing, followed by a white space character, followed by one or more digits, followed by a dash surrounded with white space characters.)
Using parentheses and search and replace you could even use this match to populate the series fields:
The first grouped match, \1 contains the series name.
The second grouped match, \2 contains the series number.
2. To match anything (including nothing) inside parentheses you use this pattern:
To match a series that is separated from the title EITHER by a semicolon followed by a space, or a dash surrounded by space, you could use for instance:
To match everything before a separator like a semicolon followed by a space you could use this:
When doing search and replace with calibre it is good practice to use parentheses to group matches and to match everything. And specify the grouped match you want to keep. That reduces the chances of nasty surprises.
So to remove everything in the title that is to the left of a semicolon like above, use this pattern:
And replace title with the contents of the first grouped pattern \1.
To remove a specific string to the right of the title use:
(.*)\sa castle novella
And replace title with grouped pattern \1.
And make sure to test carefully and to have a current backup. You WILL make mistakes that otherwise can cause you extra work.
Last edited by Adoby; 03-17-2013 at 04:43 AM. Reason: Typo
|03-16-2013, 09:40 PM||#3|
Taking a break; Fed up
Join Date: Nov 2007
Device: Wife: Touch, Arc, Vox Me: Nexus 7, Glo
Also matches are "greedy" by defaylt meaning they match as much as possible.
|03-18-2013, 06:42 PM||#4|
Join Date: Aug 2011
The basic expressions were in the manual too but to understand that stuff I need examples.
Your post helped a lot.
|06-16-2013, 02:11 PM||#5|
Join Date: Aug 2011
Here I am again ..
When I want to find all titles that are completely in upper case how can I do that?
In normal search I can't use regex, right?
|06-17-2013, 08:22 AM||#6|
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
Correct. A normal search will interpret: Castle\s0\d+ as a string of characters that should be found exactly as shown. It will never look for a whitespace, zero then one or more numbers, etc. It will only look for the word Castle followed immediately by the characters \s, etc.
As to finding titles in uppercase... it's not something I've ever tried via regex. [A-Z]+ comes to mind, but I don't think it's that easy. Wrangling regex expressions to do exactly what you want - and nothing more - can be very tough. (Be very careful with the greedy operators +, ., and +.)
Last edited by Sabardeyn; 06-17-2013 at 08:29 AM. Reason: typo
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Problems with Bulk Metadata editing||minorum||Library Management||4||11-12-2012 05:08 PM|
|Bulk Edit of comments using regex||PeterT||Library Management||2||07-25-2012 08:10 AM|
|Bulk Metadata Editing not working||jvik||Calibre||5||01-04-2011 09:34 AM|
|Editing Metadata in Bulk||ballast||Calibre||5||08-15-2010 03:14 PM|
|Editing Metadata in Bulk Question||lwpack||Calibre||10||07-19-2009 11:40 PM|