View Single Post
Old 06-12-2011, 07:39 PM   #1
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Regex: File Renaming Pre-Import & Importing

**Note: If this thread interests you keep checking back I am going to keep updating it and adding to it. The last add will be in green. Red is for incompletes. Blue is for Donations. Last add - 6/14/2011 03:31:47 (UTC)

Here is some very useful regex for use with Flash Renamer Pre-Import to Calibre followed up by Calibre Import Scripts to be used after file clean up. Flash Renamer is the ultimate in renaming files quickly in bulk. These can be used with other bulk renaming software as well but you might have to tweek. I am going to continue to add and edit this thread until it consists of a fairly large library of scripts for file renaming and calibre to assist people trying to import/convert large collections of books to/in calibre.

RegEx for use in Flash Renamer or other Bulk File Renamer

INFO====>Swap Lastname, Firstname if in front of a title or series preceding a dash(-)
EXAMPLE=>Martin, George RR - Ice & Fire 01 - Game of Thrones.epub
RESULT==>George RR Martin - Ice & Fire 01 - Game of Thrones.epub
FIND====>^(\w+), *([\w \.]+)[ ]+-[ ]*(.*)
REPLACE=>$2 $1 - $3
OR======>Depending on what program your using
REPLACE=>\2 \1 - \3

INFO====>Swap Lastname, Firstname & Lastname, Firstname when in front of a dash (-)
EXAMPLE=>Martin, George RR & Kirk, James T - Ice & Fire 01 - Game of Thrones.epub
RESULT==>George RR Martin & James T Kirk - Ice & Fire 01 - Game of Thrones.epub
FIND====>([^,]*), ([^&]*)& ([^,]*),([^-]*)-
REPLACE=>$2$1 &$4$3 -

INFO====>Swap Lastname, Firstname and Lastname, Firstname when in front of a dash (-)
EXAMPLE=>Martin, George RR and Kirk, James T - Ice & Fire 01 - Game of Thrones.epub
RESULT==>George RR Martin & James T Kirk - Ice & Fire 01 - Game of Thrones.epub
FIND====>([^,]*), ([^&]*)and ([^,]*),([^-]*)-
REPLACE=>$2$1 &$4$3 -

INFO====>Title - Author Swap
EXAMPLE=>Game of Thrones - George RR Martin.epub
RESULT==>George RR Martin - Game of Thrones.epub
FIND====>(.*) - (.*)
REPLACE=>$2 - $1
NOTE===>Simple swap... If tried on a 3 part 1-2-3 it will change to 3-1-2; if run again it will go to 2-3-1 and once more back to 1-2-3 it also carries anything else in brackets along with part 3.

INFO====>Title - Author Swap /w Series Info in the center
EXAMPLE=>Game of Thrones - Ice & Fire 01 - George RR Martin.epub
RESULT==>George RR Martin - Ice & Fire 01 - Game of Thrones.epub
FIND====>(.*) - (.*) - (.*)
REPLACE=>$3 - $2 - $1
NOTE===>This swap will carry any brackets in position 3 along with it. This can be modified to swap the three parts interchangeably by changing the order of the numbers to handle other variations. Other variations will cycle. For example if you change the replace to $3 - 2$ - $1 Then A-S-T would change to T-A-S when run again it would change to S-T-A and then back to A-S-T. So to change any position you really need a A--T Swap and then another that will cycle. You can then with a few runs get the correct order

INFO====>Title - Author Swap /w Series Info, Version, and any other info in brackets left in position. File can be 2 part (T - A) or 3 part ( T - S - A) with or without extra bracketed info.
EXAMPLE=>Game of Thrones - [Ice & Fire 01] - (unabridged) George RR Martin (v1.01) (epub).epub
RESULT==>George RR Martin - [Ice & Fire 01] - (unabridged) Game of Thrones (v1.01) (epub).epub
FIND====>([^-]*) - ([^-]* - )?([^(]*)( \(.*\))?
REPLACE=>$3-$2$1 $4
NOTE===>This is the end all be all of Author/Title swaps. It can be used for files with just Author and Title or with Files with Author, Series, Titles and both of those are with or with out other bracketed information leaving those in place. Because of this I had to allow enough spacing for 2 part swaps. That means there is an extra space in the 3 part swap. So you will need to do a Trim Spaces afterwords or you can save this for 2 part swaps and then adjust the space in the replace and save another for the 3 part swaps.

** You will need to TRIM SPACES afterwards

for other software that doesn't disregard the extension it is:
FIND====>([^-]*) - ([^-]* - )?([^(]*)( \(.*\))?(\.[^(.]*)
REPLACE=> \3 - \2\1\4\5


INFO====>Change Dots into Spaces unless in Numbers
EXAMPLE=>George.R.R.Martin.-.Ice.&.Fire.01.-.Game.of.Thrones.(v1.01).epub
RESULT==>George R R Martin - Ice & Fire 01 - Game of Thrones (v1.01).epub
FIND====>Under Construction
REPLACE=>Under Construction
NOTE===>Haven't worked it out yet for flash renamer

Other tools:

Find=====>(.*)(([^0-9])\.|\.([^0-9]))(.*\..*)#
REPLACE==>\1\3 \4\5

IMPORT SCRIPTS FOR CALIBRE

IMPORT==>Import "Author - Series Series_Index - Title" while removing brackets in series & deleting anything in brackets in the title
CALIBRE=>^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^\]{[()]+\w)
BEFORE==>George RR Martin - [Ice & Fire 01] - Game of Thrones [unabridged](v5.0)(epub).epub
AFTER===>George RR Martin - Ice & Fire 01 - Game of Thrones.epub


Donated by Debby - *haven't finished testing myself so i can't fill in output or what it does with bracketed info. Will test soon. Many thanks to Debby for the contribution of the spreadsheet she sent over.

IMPORT==>LN, FN - [Series #] - Title (swaps lastname, firstname)
CALIBRE=>^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^\]{[()]+\w)
INPUT===>Clancy, Tom Smith - [Jack Ryan Universe 01] - Without Remorse.pdf
OUTPUT==>Tom Smith Clancy - [Jack Ryan Universe 01]] - Without Remorse.pdf

IMPORT==>FN LN (Series #) Title
CALIBRE=>(?P<author>([^\-_\[\(]+))\((?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\)(?P<title>([^_\[\(]+))
INPUT===>Laura Wilder (01) Little house.pdf
OUTPUT==>

IMPORT==>Series Name - Title , Title # - Author
CALIBRE=>^((?P<series>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<title>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<author>[^\-_0-9]+)
INPUT===>Jack Ryan Universe - Without Remorse 01 - Tom Clancy.lit
OUTPUT==>

IMPORT==>Series # - Title - LN _ FN (Swaps Lastname, First Name)
CALIBRE=>^((?P<nada>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+)- (?P<author>[^-]+)
INPUT===>Alex Cross 1 - Along Came a Spider - Patterson,James.epub
OUTPUT==>

IMPORT==>Author name - Book title, Series #.format
CALIBRE=>(?P<author>[^_]+) - (?P<series>.+)( |(, Book #))(?P<series_index>[0-9]+) - (?P<title>.+)
INPUT===>Tom Clancy - Jack Ryan Universe #01 - Without Remorse.pdf
OUTPUT==>

IMPORT==>Author name - Book title.format
CALIBRE=>^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?( |(, Book #))(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+)
INPUT===>Tom Clancy - With our Remorse.txt
OUTPUT==>

IMPORT==>Title (Series #) - FN LN
CALIBRE=>(?P<title>.+) \((?P<series>[^_0-9-]*) (?P<series_index>[0-9]*)\) - (?P<author>.+)
INPUT===>Ark Angel (Alex Rider 06) - Anthony Horwortiz.lrf
OUTPUT==>

IMPORT==>Author - Title - Series 01
CALIBRE=>(?P<author>[^_-]+) -?\s*(?P<title>[^_0-9-\[\(]*)[\[\(]?(?P<series_index>[0-9]*)[\]\)]?\s*-\s*(?P<series>[^_].+) ?
INPUT===>Tom Clancy - Without Remorse - Jack Ryan Universe 01.pdf
OUTPUT==>

PREFERENCES=>Remove Page Numbers
CALIBRE=====>Page\s+\w+

PREFERENCES=>Remove Amber Lit
CALIBRE=====>(<A name=\d+>\s*</a>)?\s*(<[biu][^>]*>)?\s*Generated\s+by\s+(ABC)?\s+Amber[^<]*(<a\shref=.*?processtext.*?>)?\s*(.*?processtext. *?</a>)?(</[ibu]>)?\s*(<br>\s*)?


IMPORT==>Import "Author - Series Series_Index - Title" while removing brackets in series & leaving anything in brackets in the title like version info
CALIBRE=>^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^-]+)
BEFORE==>George RR Martin - [Ice & Fire 01] - Game of Thrones [unabridged](v5.0)(epub).epub
AFTER===>George RR Martin - Ice & Fire 01 - Game of Thrones [unabridged](v5.0)(epub).epub
NOTE====>I would clean the file of anthing you dont want in the title section before import so you only get what you want there; the next script is for taking care of that.

This is for cleaning up files of all bracketed information other than version information pre-import to calibre. These commands can be strung together in a preset batch command and run all together at once in flash renamer rather than sepperatly although thats how I have it listed. This is for running against multiple files at the same time like huge collections.

*note: i am going to use the tilde "~" to represent a white space so you will have to replace it with a space.

**This changes all brackets to rounded**
FIND====>[
REPLACE=>(
Run Rename

FIND====>]
REPLACE=>)
Run Rename

FIND====>{
REPLACE=>(
Run Rename

FIND====>}
REPLACE=>)

Trim Spaces (a simple button press in flash renamer)

**This removes brackets from around series without deleting series - remember to remove tildes the represent space**
FIND====>-~(
REPLACE=>-~
Run Rename

FIND====>)~-
REPLACE=>~-
Run Rename

**Check the Regex Box- This Changes the Brackets around version info to square brackets and leaves all others rounded**
FIND====>(.+)\((v.+?)\)(.*)
REPLACE=>$1[$2]$3

**Uncheck regex Box - This is a wild card it deletes all rounded brackets and whatever they contain**
FIND====>(*)
REPLACE=>

Trim Spaces

**This changes the square brackets back to rounded in version info**
FIND====>(.+)\[(.+)](.*)
REPLACE=>$1($2)

Trim Spaces

Now the file that looked like this:
George RR Martin - [Ice & Fire 01] - Game of Thrones [unabridged](v5.0)(epub).epub

Will Look Like this:
George RR Martin - Ice & Fire 01 - Game of Thrones(v5.0).epub

You can then use this calibre import and keep version info in title:

IMPORT==>Import "Author - Series Series_Index - Title" while removing brackets in series & leaving anything in brackets in the title like version info
CALIBRE=>^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^-]+)

(to be continued)
(anyone else wanting to add to this thread with useful script please do so)

Last edited by penguinaka; 06-14-2011 at 11:39 PM.
penguinaka is offline   Reply With Quote