View Single Post
Old 03-16-2013, 08:56 PM   #2
Adoby
Handy Elephant
Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.
 
Adoby's Avatar
 
Posts: 1,737
Karma: 26785684
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Samsung Galaxy Tab S8 Ultra
Some parts of the RE language that every one that uses it should learn:

Use \d to specify any digit.
Use \s to specify any white space character.
Use . to specify any character.
+ specifies one or more of the pattern before it.
? specifies zero or one of the pattern before it.
* specifies zero or more of the pattern before it.

Use \d+ to specify one or more digits. Use .+ to specify any text consisting of one or more characters. Use .* to specify any text consisting of zero or more characters.

Use parentheses to group matches into larger patterns. These grouped patterns can be retrieved in order using \1, \2 and so on.

Escape special characters with a backspace to match them.
Use | to match either the pattern before or after it.

(Read more for instance here: http://docs.python.org/2/howto/regex.html, there are a lot of RE tutorials, including in the calibre forum here.)

1. So to match the series name "Castle" with series number as you ask for, you could use:

Castle\s0\d+

(The string "Castle" followed by a white space, followed by a zero, followed by one or more digits.)

Or to more generally match any series name and number separated from the title with a dash with spaces on both sides of it:

.*\s\d+\s-\s

(Any text, including nothing, followed by a white space character, followed by one or more digits, followed by a dash surrounded with white space characters.)

Using parentheses and search and replace you could even use this match to populate the series fields:

(.*)\s(\d+)\s-\s

The first grouped match, \1 contains the series name.
The second grouped match, \2 contains the series number.

2. To match anything (including nothing) inside parentheses you use this pattern:

\(.*\)

To match a series that is separated from the title EITHER by a semicolon followed by a space, or a dash surrounded by space, you could use for instance:

.*\s\d+(;\s)|(\s-\s)

To match everything before a separator like a semicolon followed by a space you could use this:

.*;\s

When doing search and replace with calibre it is good practice to use parentheses to group matches and to match everything. And specify the grouped match you want to keep. That reduces the chances of nasty surprises.

So to remove everything in the title that is to the left of a semicolon like above, use this pattern:

.*;\s(.*)

And replace title with the contents of the first grouped pattern \1.

To remove a specific string to the right of the title use:

(.*)\sa castle novella

And replace title with grouped pattern \1.

And make sure to test carefully and to have a current backup. You WILL make mistakes that otherwise can cause you extra work.

Last edited by Adoby; 03-17-2013 at 04:43 AM. Reason: Typo
Adoby is offline   Reply With Quote