![]() |
#1 |
Junior Member
![]() ![]() Posts: 4
Karma: 114
Join Date: Apr 2012
Device: iPhone
|
![]()
Hi all.
I got a fairly large ebook collection from an over seas friend of mine the other day, the problem is I can't figure out how to add them to calibre in bulk and get the metadata to add up right. The books are all in txt format and come with names in of 3 formats: Author - Title.txt Author - Series - Title.txt Author - Series - Series No. - Title.txt The Author parts are in these formats, note A or B means initials Last, First Last, A B Last, First B I know about as much about regular expressions as most people know about the dark side of Europa so ![]() Regards, Shadewing edit: I can change the separators in bulk fairly easily if that helps, i.e. Author $ Series # Series No. - Title.txt Last edited by Shadewing; 04-02-2012 at 03:34 AM. |
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
Quote:
(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ? Maybe not but works for me. Helen |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Newbie Nerd
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
|
I'm curious too :?
Your regex has 4 parts (I think) ![]() How will Calibre work out what part(s) (of the file name?) aren't there? As I see it, series AND series number can be absent so how does Calibre know that parts 2 & 3 aren't there? Or part 3 only could be missing? ![]() I suppose I can play around with a test library and a collection of various test books ![]() I currently am happy with the stock standard - (?P<title>.+) - (?P<author>[^_]+) My current format, which I hope to process manually, doesn't have a separator between series and number:- Series 1 - Title - A N Author.pdf Series 2 - Title - A N Author.pdf Series 3 - Title - A N Author.pdf etc I currently change this to:- Series 1, Title - A N Author.pdf Series 2, Title - A N Author.pdf Series 3, Title - A N Author.pdf etc so series info is part of the title to manually process the series info later ![]() ![]() So much to learn, so little time.... ![]() |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
Before posting I actually tried that expression on a text file. It is not mine, some kind user posted it or something, and it works on more than one type of nameing convention f I recall correctly.
But I do not understand it. I am not a regex person. Wish that I were. I am happy if I get title/author correct on import in 95% cases and generally I do. Then I do the download metadata thing and this is reasonable in most cases. An answer from chaley to another user today pointed me to a much easier way to modify tags and some other inconsistencies like the author initials thing you mentioned earlier in this thread I think. I am way too nitpicky about consistency, but the inconsistencies themselves are so inconsistent that it would almost take an AI to catch them all or a database of all known authors and the 12 different ways some of their names are used and punctuated by themselves and or publishers. I recently found a B author listed under I as his name was something like John B. Butterworth IV. I mean really ![]() Oh well. Start a new thread and ask for the correct regex for the convention you are using. Someone will answer I am sure. And not important but in my experience most metadata sources use A. N. author. Not as higgledy piggledy as two years ago. Good luck and happy importing. Helen |
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
Something you may find helpful?
Under preferences>look and feel> book details tab I have the default author link set as http://www.fantasticfiction.co.uk/search/?searchfor=author&keywords={author} Clicking on the authors name in the book details pane takes you to the page for that author and with fantastic fiction is very easy to see most common author spelling and most common series name. And get ISBN or pseudonym etc. Helen |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Newbie Nerd
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
|
JFYI @speakingtohe
The regex you supplied works with Author - Series Series No - Title but NOT with Author - Series - Series No - Title It doesn't seem to process the " - " between the series and the series_index ![]() In the second format it puts the series no in the title. No biggie, JFYI |
![]() |
![]() |
![]() |
#7 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
Odd it worked for me, maybe a typo on my part are a glass of wine too many
![]() I will check again but not today. |
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
No hyphen between series and series number if I recall correctly. Maybe I typed it wrong in other thread.
Sorry Helen Author LN, Author FN - series xx -title was what I had posted in one spot anyway. Last edited by speakingtohe; 04-04-2012 at 12:44 AM. Reason: added |
![]() |
![]() |
![]() |
#9 | |
Newbie Nerd
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
|
@speakingtohe I confirmed your regex but the series no CANNOT have any non numeric in it (cannot be 1.5 for instance) and cannot be separated from the series by a "-". If you aren't interested in something . something then your regex is okay.
JFI I mucked about with it and got this one:- (?P<author>[^_-]+)\s*-\s*(?P<series>[^_0-9-]*)\s*-\s*(?P<series_index>[0-9]*(.?[0-9]*))?\s*-\s*(?P<title>[^_].+) It insists on 4 parts (" - " between series and series_index) and will accept any numeric series number (even 1234.56789) ![]() So strictly speaking, NO regex (I think) will do what was asked in the first place. ![]() ![]() ![]() Quote:
![]() Both regexs will transfer the Author EXACTLY as input - LN, FN or FN LN or ???? I am SLOWLY beginning to understand regexs ![]() ![]() |
|
![]() |
![]() |
![]() |
#10 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
Quote:
First: I don't understand the regex and don't pretend to. Never impied it was mine. Was just a suggestion for a starting point and I was fine that you ignored it when mentioned first. Second: your previous regex is selectable from the dropdown list so switching is not that hard but if you are manually renaming all of your books first shouldn't be an issue. Third: Maybe 1.5 doesn't work, no idea. maybe 01.5 will work. Probably not. Kind if a special case in my experience as most series books I read are numbered in a straighforward manner and I adjust the others if I have to but usually I don't as I am not big into romances currently. Glad you are getting the hang of regexes and wish that I would expend some effort to do the same. But feeling a bit annoyed at being browbeaten and belittled at the moment ![]() |
|
![]() |
![]() |
![]() |
#11 |
Newbie Nerd
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
|
None of my statements were intended to "browbeat" nor "belittle".
All my statements were informational hence the JFYI and JFI ![]() Yesterday I knew NOTHING about regexs, today I know a little. My first "Add" got 40% error rate and my second .2%. So much to learn, so little time. Certainly no time for browbeating or belittling ![]() I am also renaming about 2000 files of differing formats. I wish I knew about series a couple of days ago as I have done HOURS of work with the aim of manually editing the series information in. I NOW know how to do it in the file name ![]() |
![]() |
![]() |
![]() |
#12 | |
Newbie Nerd
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
|
Quote:
This regex WILL do exactly what you want. ![]() ![]() ![]() (?P<author>[^_-]+)\s*-\s*((?P<series>[^_0-9-]*)\s*-\s*)?((?P<series_index>[0-9]*(.?[0-9]*))?\s*-\s*)?(?P<title>[^_].+) Author AND Title MUST be there! Series and/or Series no may be there or not or one or the other. ![]() The series no can have a decimal in it. I think 2.5 is a book later added between 2 and 3? ![]() If only one of series or series no is present, I suspect Calibre knows which is which from the fact that the series no is numeric. ![]() The author is transfered as entered but Calibre has an option to swap firstname and lastname. ![]() |
|
![]() |
![]() |
![]() |
#13 |
Newbie Nerd
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
|
Oops, please check before you commit. The regex seems to works sometimes, not all the time
![]() |
![]() |
![]() |
![]() |
#14 |
Newbie Nerd
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
|
Panic over!
![]() Author - Title.txt works Author - Series - Title.txt works Author - Series - Series_Index - Title.txt works Author - Series_Index - Title.txt also works but I don't know the effect of an index without an associated series on Calibre ![]() Author - Series Series_Index - Title.txt DOES NOT WORK (no hyphen between Series and Series_Index). This puts the series and the series_index in with the title. So this regex does ALL the Shadewing asked for ![]() Phewww! |
![]() |
![]() |
![]() |
Tags |
adding ebooks, help please, regular expressions |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regular Expression - Adding metadata from filename | LMF | Calibre | 1 | 03-20-2012 06:46 PM |
Help in adding books using "regular expression" | DM399 | Calibre | 2 | 07-08-2011 06:38 AM |
Adding Books, regular expression | smarties86 | Calibre | 4 | 12-19-2010 08:18 AM |
Regular Expression on adding books. | Lokro | Calibre | 4 | 11-06-2010 11:05 AM |
Regular Expression For Adding Books | jhart711 | Calibre | 3 | 09-27-2010 06:51 AM |