08-25-2012, 03:11 AM | #1 |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
RegEx for title - author - series?
Have tried to convert other Regular Expressions to work when adding books into Calibre, but no luck.
Books in series are like this one: "To The King A Daughter - Andre Norton & Sasha Miller - Oak, Yew, Ash & Rowan 01.mobi". Books without a series are like this: "Between the Dark - Algis Budrys.mobi". Any ideas as to how to make series optional at the end of the string? Each time I try moving pieces of another RegEx around -- e.g. kiwidude's sample in QuickRef plug-in -- it only works for books with a series, but for those not in a series. Have tried the Python Regular Expression documents, but don't seem to be able to find a solution. Any help appreciated! |
08-25-2012, 11:35 AM | #2 |
Newbie Nerd
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
|
It's 23:30 so just a quick answer....
You put a (....)?* around the "series" extraction logic. Quickly.... (?P<author>.+?) - (?P<title>.+)( - (?P<series>.+))?* Sorry, it's a while since I have done anything with Regex and it's late but that should get you on the right track.... Also sorry, title and author swapped around by your question.... |
08-26-2012, 03:01 AM | #3 |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
Thanks for the effort, louwin, but it didn't work.
I tried it on several examples, like the one above or "Pilgrimage - Zenna Henderson - People 1.mobi" and "Holding Wonder - Zenna Henderson.mobi". It didn't work. I had previously tried playing with parentheses, question marks, etc., but to no avail. |
08-26-2012, 03:24 AM | #4 | |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
For Title - Author - Series I use this regex Code:
^((?P<title>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<author>[^\-_0-9]+)\s*-\s*)?(?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+) Code:
(?P<title>.+) - (?P<author>[^_]+) I currently have 5 different regex I switch between as needed. |
|
08-26-2012, 03:41 AM | #5 |
Newbie Nerd
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
|
Again quickly (off to play cards with a couple of mates)
My Regex which worked.... Haven't played with it in 4 months though It does things in different sequence but it worked.... It is more complex as it also handled series_index (?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ? Both series and series_index are optional but Author and Title are mandatory.... My first answer had an asterisk on the end that was not required? Good luck.... |
08-26-2012, 05:35 AM | #6 |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
Thanks, dwanthy, but your RegEx only works, if the series information is present. If a book isn't part of a series, it fails.
Same with your expression louwin. I removed the asterisk, but no luck. Thanks for trying, guys! I have a ton of old books in format above to load in. |
08-26-2012, 06:17 AM | #7 | |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
Author - Series - Title Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+) Code:
^((?P<title>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<author>[^\-_0-9]+) Code:
^((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?((?P<title>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(?P<author>[^\-_0-9]+) Last edited by DoctorOhh; 08-26-2012 at 06:23 AM. |
|
08-26-2012, 07:51 AM | #8 |
Enthusiast
Posts: 30
Karma: 752
Join Date: Nov 2010
Device: PB360
|
How about this:
Code:
(?P<title>[^-]+)\s*-\s*(?P<author>[^-]+)(\s*-\s*(?P<series>[^0-9-]+)(\s+|$)(?P<series_index>[0-9]+)?|$) I'm not sure why, but using an '?' at the end of the expression to make the series part optional didn't work for me, so the expression tests for an optional series or the end of the string. Additionally the expression tries to avoid to add a space to the end of the series name if there is a series index present. EDIT: With a more recent calibre version using '?' for the optional series seems to work. A somewhat refined version which doesn't add a space at the end of the title: Code:
(?P<title>[^-]+[^\s])\s*-\s*(?P<author>[^-]+)(\s*-\s*(?P<series>[^0-9-]+)($|(\s+(?P<series_index>[0-9]+))))? Last edited by JustForFun; 08-26-2012 at 10:05 AM. Reason: Refinement |
08-26-2012, 08:59 AM | #9 |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
May your favourite chocolate be ever in abundance!
Thanks, JustForFun! Worked as advertised! |
08-29-2012, 12:39 PM | #10 |
Connoisseur
Posts: 90
Karma: 618
Join Date: Oct 2007
Location: Ottawa
Device: PocketBook Pro 902, EB-1150, PRS505, PRS700, Jetbook, Hanlin V3, Kobo
|
Book naming conventions
I'm not sure if this is related to this topic, but I believe it is.
I have a HUGE (50K+) collection of ebooks in a variety of formats. Many of them are probably duplicates, but locating the duplicates can be a very slow, tedious process, even with good bulk renaming tools and such. Most of my family and several friends are readers who use ereaders, and I'd like to use Calibre's content server to allow them access to my library. I'd even be open to allowing the same library access to this forum's members. Almost everyone out there seems to have some unique format they prefer for listing their books in their libraries. Some prefer: Author First name, last name - title, series. Others prefer: title - author first name, last name, series, or, author last name, first name - title, etc. Heck, despite the file extension indicating the file type, some people actually put things like "(epub)" or "(mobi)" into the titles. Because of this, I have to handraulically rename large portions of my collection to try to see where the duplicates are, and which formats I want to keep. I've been working on it sporadically for the past several years, and I'm still only working in the first half dozen letters of the alphabet. In addition, there are all the special characters people use, dashes, colons, semi-colons, brackets, braces, etc. Personally, I prefer the old fashioned way of cataloging: Author Last Name, Author First Name, Title, Series. What I'd like is a script (or scripts) that would allow me to look at the different elements in the filenames and swap/delete them appropriately, in bulk. Even if I have to go through the file listings a page or two at a time and select individual titles, it would be quicker than what I'm doing now. Unfortunately, I'm not a programmer or software whiz, so I don't understand the Regex conventions well enough to do this myself. Any help would be really appreciated, Thanks, Dan |
08-29-2012, 11:11 PM | #11 |
null operator (he/him)
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@dpayment - have you tried Find Duplicates plugin with fuzzy logic
https://www.mobileread.com/forums/sho...d.php?t=131017 Preferences->Plugins->User Interface Action->Find Duplicates BR |
Tags |
adding books, regular expression |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Plugboard "Metadata: Show series [series index] - title as title (Kindle)" | Deep Cover | Library Management | 6 | 11-30-2012 05:17 PM |
PRS-650 [calibre] How do I set it up so I can sort by author - Series - Title?? | BelgarionNL | Sony Reader | 49 | 06-22-2012 10:04 PM |
[Old Thread] Sorting folders by author/series/title | goodreader16 | Library Management | 15 | 05-06-2011 01:18 AM |
Calibre doesnt remember (Title.Author,Series,Metadata) changes?! | Rafaelo4 | Calibre | 9 | 08-19-2010 07:23 AM |
libprs500 - title/author matching regex | Megatron-UK | Calibre | 15 | 04-01-2008 04:39 PM |