Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 03-16-2013, 05:56 PM   #1
manawydan
Connoisseur
manawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toys
 
Posts: 95
Karma: 5854
Join Date: Aug 2011
Device: none
Regex help pls for bulk editing metadata

I have tried to make sense of the regex help in the manual but can't do it on my own.

I need help with two things.
Both are for bulk editing metadata (title).
Sorry if I mix terms here, I have very little knowledge about this.

Calibre version is 0.9.15.

1.
I can remove parts of a title with the replace-function if the string is always the same.
E.g.
Castle 03. Castle Kidnapped
Castle 04 - Ever after
I can remove Castle 0

However this would leave different numbers as well as characters (. - space).
E.g.
3. Castle Kidnapped
4 - Ever after

What is the regular expression to remove varying numbers too and could this be combined in one regex (remove "castle" + any number with leading zero)?
Dots etc. could be removed with search-replace if needed.

2.
Sometimes I want to bulk remove part of a title.
E.g. everything in brackets
(Castle 3) Castle Kidnapped
Castle lost (a castle novella)

How to do this (the text within brackets is not the same)?

What if the string you want to remove is not enclosed by the same character?

E.g.
Castle 3; Castle Kidnapped
Castle lost - a castle novella

You would have to use an expression like "remove everything left from ;" or "remove everything after -", correct? But how?

Can you use a phrase as "seperator"?
If a title is like this

Castle 3 Kidnapped and loving it
Kidnapped and loving it a castle novella

Is it possible to use "kidnapped and loving it" as string and remove text left or right of it?

Thanks in advance.

Last edited by manawydan; 03-16-2013 at 05:58 PM.
manawydan is offline   Reply With Quote
Old 03-16-2013, 08:56 PM   #2
Adoby
Handy Elephant
Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.
 
Adoby's Avatar
 
Posts: 1,737
Karma: 26785684
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Samsung Galaxy Tab S8 Ultra
Some parts of the RE language that every one that uses it should learn:

Use \d to specify any digit.
Use \s to specify any white space character.
Use . to specify any character.
+ specifies one or more of the pattern before it.
? specifies zero or one of the pattern before it.
* specifies zero or more of the pattern before it.

Use \d+ to specify one or more digits. Use .+ to specify any text consisting of one or more characters. Use .* to specify any text consisting of zero or more characters.

Use parentheses to group matches into larger patterns. These grouped patterns can be retrieved in order using \1, \2 and so on.

Escape special characters with a backspace to match them.
Use | to match either the pattern before or after it.

(Read more for instance here: http://docs.python.org/2/howto/regex.html, there are a lot of RE tutorials, including in the calibre forum here.)

1. So to match the series name "Castle" with series number as you ask for, you could use:

Castle\s0\d+

(The string "Castle" followed by a white space, followed by a zero, followed by one or more digits.)

Or to more generally match any series name and number separated from the title with a dash with spaces on both sides of it:

.*\s\d+\s-\s

(Any text, including nothing, followed by a white space character, followed by one or more digits, followed by a dash surrounded with white space characters.)

Using parentheses and search and replace you could even use this match to populate the series fields:

(.*)\s(\d+)\s-\s

The first grouped match, \1 contains the series name.
The second grouped match, \2 contains the series number.

2. To match anything (including nothing) inside parentheses you use this pattern:

\(.*\)

To match a series that is separated from the title EITHER by a semicolon followed by a space, or a dash surrounded by space, you could use for instance:

.*\s\d+(;\s)|(\s-\s)

To match everything before a separator like a semicolon followed by a space you could use this:

.*;\s

When doing search and replace with calibre it is good practice to use parentheses to group matches and to match everything. And specify the grouped match you want to keep. That reduces the chances of nasty surprises.

So to remove everything in the title that is to the left of a semicolon like above, use this pattern:

.*;\s(.*)

And replace title with the contents of the first grouped pattern \1.

To remove a specific string to the right of the title use:

(.*)\sa castle novella

And replace title with grouped pattern \1.

And make sure to test carefully and to have a current backup. You WILL make mistakes that otherwise can cause you extra work.

Last edited by Adoby; 03-17-2013 at 04:43 AM. Reason: Typo
Adoby is offline   Reply With Quote
Advert
Old 03-16-2013, 09:40 PM   #3
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
Posts: 13,381
Karma: 78877538
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
Also matches are "greedy" by defaylt meaning they match as much as possible.
PeterT is offline   Reply With Quote
Old 03-18-2013, 06:42 PM   #4
manawydan
Connoisseur
manawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toys
 
Posts: 95
Karma: 5854
Join Date: Aug 2011
Device: none
Thank you.
The basic expressions were in the manual too but to understand that stuff I need examples.
Your post helped a lot.
manawydan is offline   Reply With Quote
Old 06-16-2013, 02:11 PM   #5
manawydan
Connoisseur
manawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toysmanawydan shares his or her toys
 
Posts: 95
Karma: 5854
Join Date: Aug 2011
Device: none
Here I am again ..

When I want to find all titles that are completely in upper case how can I do that?
In normal search I can't use regex, right?
manawydan is offline   Reply With Quote
Advert
Old 06-17-2013, 08:22 AM   #6
Sabardeyn
Guru
Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.
 
Sabardeyn's Avatar
 
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
Correct. A normal search will interpret: Castle\s0\d+ as a string of characters that should be found exactly as shown. It will never look for a whitespace, zero then one or more numbers, etc. It will only look for the word Castle followed immediately by the characters \s, etc.

As to finding titles in uppercase... it's not something I've ever tried via regex. [A-Z]+ comes to mind, but I don't think it's that easy. Wrangling regex expressions to do exactly what you want - and nothing more - can be very tough. (Be very careful with the greedy operators +, ., and +.)

Last edited by Sabardeyn; 06-17-2013 at 08:29 AM. Reason: typo
Sabardeyn is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with Bulk Metadata editing minorum Library Management 4 11-12-2012 05:08 PM
Bulk Edit of comments using regex PeterT Library Management 2 07-25-2012 08:10 AM
Bulk Metadata Editing not working jvik Calibre 5 01-04-2011 09:34 AM
Editing Metadata in Bulk ballast Calibre 5 08-15-2010 03:14 PM
Editing Metadata in Bulk Question lwpack Calibre 10 07-19-2009 11:40 PM


All times are GMT -4. The time now is 02:12 AM.


MobileRead.com is a privately owned, operated and funded community.