Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 08-25-2012, 03:11 AM   #1
mattam
Junior Member
mattam began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
RegEx for title - author - series?

Have tried to convert other Regular Expressions to work when adding books into Calibre, but no luck.

Books in series are like this one: "To The King A Daughter - Andre Norton & Sasha Miller - Oak, Yew, Ash & Rowan 01.mobi".

Books without a series are like this: "Between the Dark - Algis Budrys.mobi".

Any ideas as to how to make series optional at the end of the string? Each time I try moving pieces of another RegEx around -- e.g. kiwidude's sample in QuickRef plug-in -- it only works for books with a series, but for those not in a series.

Have tried the Python Regular Expression documents, but don't seem to be able to find a solution.

Any help appreciated!
mattam is offline   Reply With Quote
Old 08-25-2012, 11:35 AM   #2
louwin
Newbie Nerd
louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.
 
louwin's Avatar
 
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
It's 23:30 so just a quick answer....

You put a (....)?* around the "series" extraction logic.

Quickly....

(?P<author>.+?) - (?P<title>.+)( - (?P<series>.+))?*

Sorry, it's a while since I have done anything with Regex and it's late but that should get you on the right track....

Also sorry, title and author swapped around by your question....
louwin is offline   Reply With Quote
Advert
Old 08-26-2012, 03:01 AM   #3
mattam
Junior Member
mattam began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
Thanks for the effort, louwin, but it didn't work.
I tried it on several examples, like the one above or "Pilgrimage - Zenna Henderson - People 1.mobi" and "Holding Wonder - Zenna Henderson.mobi". It didn't work.
I had previously tried playing with parentheses, question marks, etc., but to no avail.
mattam is offline   Reply With Quote
Old 08-26-2012, 03:24 AM   #4
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by mattam View Post
Thanks for the effort, louwin, but it didn't work.
I tried it on several examples, like the one above or "Pilgrimage - Zenna Henderson - People 1.mobi" and "Holding Wonder - Zenna Henderson.mobi". It didn't work.
I had previously tried playing with parentheses, question marks, etc., but to no avail.
The following work for your examples. I didn't write them so don't ask me to explain them.

For Title - Author - Series I use this regex

Code:
^((?P<title>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<author>[^\-_0-9]+)\s*-\s*)?(?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)
For Title - Author I use this regex

Code:
(?P<title>.+) - (?P<author>[^_]+)
I use the Quick Preferences plugin to quickly switch between them as needed.

I currently have 5 different regex I switch between as needed.
DoctorOhh is offline   Reply With Quote
Old 08-26-2012, 03:41 AM   #5
louwin
Newbie Nerd
louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.
 
louwin's Avatar
 
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
Again quickly (off to play cards with a couple of mates)

My Regex which worked.... Haven't played with it in 4 months though

It does things in different sequence but it worked....

It is more complex as it also handled series_index

(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?

Both series and series_index are optional but Author and Title are mandatory....

My first answer had an asterisk on the end that was not required?

Good luck....
louwin is offline   Reply With Quote
Advert
Old 08-26-2012, 05:35 AM   #6
mattam
Junior Member
mattam began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
Thanks, dwanthy, but your RegEx only works, if the series information is present. If a book isn't part of a series, it fails.

Same with your expression louwin. I removed the asterisk, but no luck.

Thanks for trying, guys! I have a ton of old books in format above to load in.
mattam is offline   Reply With Quote
Old 08-26-2012, 06:17 AM   #7
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by mattam View Post
Thanks, dwanthy, but your RegEx only works, if the series information is present. If a book isn't part of a series, it fails.
I knew that was the case. That is why I gave you two regex one with series and one without series and suggested you use the Quick Preferences Plugin to switch between the two as needed. I also have regex for

Author - Series - Title

Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+)
Title - Series - Author and

Code:
^((?P<title>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<author>[^\-_0-9]+)
Series - Tile - Author too

Code:
^((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?((?P<title>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(?P<author>[^\-_0-9]+)
in my Quick preferences plugin. It is a simple matter of quickly switching between regex before adding any group of books as needed.

Last edited by DoctorOhh; 08-26-2012 at 06:23 AM.
DoctorOhh is offline   Reply With Quote
Old 08-26-2012, 07:51 AM   #8
JustForFun
Enthusiast
JustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-books
 
Posts: 30
Karma: 752
Join Date: Nov 2010
Device: PB360
How about this:
Code:
(?P<title>[^-]+)\s*-\s*(?P<author>[^-]+)(\s*-\s*(?P<series>[^0-9-]+)(\s+|$)(?P<series_index>[0-9]+)?|$)
It matches both given examples in the test of the preferences dialog. It also matches a series without a series index.

I'm not sure why, but using an '?' at the end of the expression to make the series part optional didn't work for me, so the expression tests for an optional series or the end of the string. Additionally the expression tries to avoid to add a space to the end of the series name if there is a series index present.

EDIT:
With a more recent calibre version using '?' for the optional series seems to work. A somewhat refined version which doesn't add a space at the end of the title:
Code:
(?P<title>[^-]+[^\s])\s*-\s*(?P<author>[^-]+)(\s*-\s*(?P<series>[^0-9-]+)($|(\s+(?P<series_index>[0-9]+))))?
Interestingly authors seems to be processed further as it does not have a space at the end which the expression would leave there.

Last edited by JustForFun; 08-26-2012 at 10:05 AM. Reason: Refinement
JustForFun is offline   Reply With Quote
Old 08-26-2012, 08:59 AM   #9
mattam
Junior Member
mattam began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
May your favourite chocolate be ever in abundance!

Thanks, JustForFun! Worked as advertised!
mattam is offline   Reply With Quote
Old 08-29-2012, 12:39 PM   #10
dpayment
Connoisseur
dpayment will become famous soon enoughdpayment will become famous soon enoughdpayment will become famous soon enoughdpayment will become famous soon enoughdpayment will become famous soon enoughdpayment will become famous soon enough
 
dpayment's Avatar
 
Posts: 90
Karma: 618
Join Date: Oct 2007
Location: Ottawa
Device: PocketBook Pro 902, EB-1150, PRS505, PRS700, Jetbook, Hanlin V3, Kobo
Book naming conventions

I'm not sure if this is related to this topic, but I believe it is.

I have a HUGE (50K+) collection of ebooks in a variety of formats. Many of them are probably duplicates, but locating the duplicates can be a very slow, tedious process, even with good bulk renaming tools and such. Most of my family and several friends are readers who use ereaders, and I'd like to use Calibre's content server to allow them access to my library. I'd even be open to allowing the same library access to this forum's members.

Almost everyone out there seems to have some unique format they prefer for listing their books in their libraries. Some prefer: Author First name, last name - title, series. Others prefer: title - author first name, last name, series, or, author last name, first name - title, etc. Heck, despite the file extension indicating the file type, some people actually put things like "(epub)" or "(mobi)" into the titles. Because of this, I have to handraulically rename large portions of my collection to try to see where the duplicates are, and which formats I want to keep. I've been working on it sporadically for the past several years, and I'm still only working in the first half dozen letters of the alphabet. In addition, there are all the special characters people use, dashes, colons, semi-colons, brackets, braces, etc.

Personally, I prefer the old fashioned way of cataloging: Author Last Name, Author First Name, Title, Series. What I'd like is a script (or scripts) that would allow me to look at the different elements in the filenames and swap/delete them appropriately, in bulk. Even if I have to go through the file listings a page or two at a time and select individual titles, it would be quicker than what I'm doing now.

Unfortunately, I'm not a programmer or software whiz, so I don't understand the Regex conventions well enough to do this myself. Any help would be really appreciated,
Thanks,
Dan
dpayment is offline   Reply With Quote
Old 08-29-2012, 11:11 PM   #11
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@dpayment - have you tried Find Duplicates plugin with fuzzy logic

https://www.mobileread.com/forums/sho...d.php?t=131017

Preferences->Plugins->User Interface Action->Find Duplicates

BR
BetterRed is offline   Reply With Quote
Reply

Tags
adding books, regular expression


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Plugboard "Metadata: Show series [series index] - title as title (Kindle)" Deep Cover Library Management 6 11-30-2012 05:17 PM
PRS-650 [calibre] How do I set it up so I can sort by author - Series - Title?? BelgarionNL Sony Reader 49 06-22-2012 10:04 PM
[Old Thread] Sorting folders by author/series/title goodreader16 Library Management 15 05-06-2011 01:18 AM
Calibre doesnt remember (Title.Author,Series,Metadata) changes?! Rafaelo4 Calibre 9 08-19-2010 07:23 AM
libprs500 - title/author matching regex Megatron-UK Calibre 15 04-01-2008 04:39 PM


All times are GMT -4. The time now is 09:46 AM.


MobileRead.com is a privately owned, operated and funded community.