Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 04-02-2012, 03:29 AM   #1
Shadewing
Junior Member
Shadewing doesn't litterShadewing doesn't litter
 
Posts: 4
Karma: 114
Join Date: Apr 2012
Device: iPhone
Unhappy Adding books, Regular expression help please

Hi all.

I got a fairly large ebook collection from an over seas friend of mine the other day, the problem is I can't figure out how to add them to calibre in bulk and get the metadata to add up right.

The books are all in txt format and come with names in of 3 formats:

Author - Title.txt
Author - Series - Title.txt
Author - Series - Series No. - Title.txt

The Author parts are in these formats, note A or B means initials

Last, First
Last, A B
Last, First B


I know about as much about regular expressions as most people know about the dark side of Europa so please.

Regards,

Shadewing

edit: I can change the separators in bulk fairly easily if that helps, i.e.
Author $ Series # Series No. - Title.txt

Last edited by Shadewing; 04-02-2012 at 03:34 AM.
Shadewing is offline   Reply With Quote
Old 04-02-2012, 03:59 PM   #2
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
Quote:
Author - Series - Series No. - Title.txt
works for me but they all should be ok with this
(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?

Maybe not but works for me.

Helen
speakingtohe is offline   Reply With Quote
Advert
Old 04-03-2012, 11:14 PM   #3
louwin
Newbie Nerd
louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.
 
louwin's Avatar
 
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
I'm curious too :?

Your regex has 4 parts (I think)

How will Calibre work out what part(s) (of the file name?) aren't there?

As I see it, series AND series number can be absent so how does Calibre know that parts 2 & 3 aren't there? Or part 3 only could be missing?

I suppose I can play around with a test library and a collection of various test books

I currently am happy with the stock standard -

(?P<title>.+) - (?P<author>[^_]+)

My current format, which I hope to process manually, doesn't have a separator between series and number:-

Series 1 - Title - A N Author.pdf
Series 2 - Title - A N Author.pdf
Series 3 - Title - A N Author.pdf etc

I currently change this to:-

Series 1, Title - A N Author.pdf
Series 2, Title - A N Author.pdf
Series 3, Title - A N Author.pdf etc

so series info is part of the title to manually process the series info later

So much to learn, so little time....
louwin is offline   Reply With Quote
Old 04-03-2012, 11:44 PM   #4
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
Before posting I actually tried that expression on a text file. It is not mine, some kind user posted it or something, and it works on more than one type of nameing convention f I recall correctly.

But I do not understand it. I am not a regex person. Wish that I were.

I am happy if I get title/author correct on import in 95% cases and generally I do. Then I do the download metadata thing and this is reasonable in most cases.

An answer from chaley to another user today pointed me to a much easier way to modify tags and some other inconsistencies like the author initials thing you mentioned earlier in this thread I think. I am way too nitpicky about consistency, but the inconsistencies themselves are so inconsistent that it would almost take an AI to catch them all or a database of all known authors and the 12 different ways some of their names are used and punctuated by themselves and or publishers. I recently found a B author listed under I as his name was something like John B. Butterworth IV. I mean really.

Oh well. Start a new thread and ask for the correct regex for the convention you are using. Someone will answer I am sure.

And not important but in my experience most metadata sources use A. N. author. Not as higgledy piggledy as two years ago.

Good luck and happy importing.

Helen
speakingtohe is offline   Reply With Quote
Old 04-03-2012, 11:55 PM   #5
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
Something you may find helpful?

Under preferences>look and feel> book details tab I have the default author link set as
http://www.fantasticfiction.co.uk/search/?searchfor=author&keywords={author}

Clicking on the authors name in the book details pane takes you to the page for that author and with fantastic fiction is very easy to see most common author spelling and most common series name. And get ISBN or pseudonym etc.

Helen
speakingtohe is offline   Reply With Quote
Advert
Old 04-04-2012, 12:28 AM   #6
louwin
Newbie Nerd
louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.
 
louwin's Avatar
 
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
JFYI @speakingtohe

The regex you supplied works with

Author - Series Series No - Title

but NOT with

Author - Series - Series No - Title

It doesn't seem to process the " - " between the series and the series_index

In the second format it puts the series no in the title.

No biggie, JFYI
louwin is offline   Reply With Quote
Old 04-04-2012, 12:38 AM   #7
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
Odd it worked for me, maybe a typo on my part are a glass of wine too many

I will check again but not today.
speakingtohe is offline   Reply With Quote
Old 04-04-2012, 12:41 AM   #8
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
No hyphen between series and series number if I recall correctly. Maybe I typed it wrong in other thread.

Sorry
Helen

Author LN, Author FN - series xx -title

was what I had posted in one spot anyway.

Last edited by speakingtohe; 04-04-2012 at 12:44 AM. Reason: added
speakingtohe is offline   Reply With Quote
Old 04-04-2012, 02:22 AM   #9
louwin
Newbie Nerd
louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.
 
louwin's Avatar
 
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
@speakingtohe I confirmed your regex but the series no CANNOT have any non numeric in it (cannot be 1.5 for instance) and cannot be separated from the series by a "-". If you aren't interested in something . something then your regex is okay.

JFI I mucked about with it and got this one:-

(?P<author>[^_-]+)\s*-\s*(?P<series>[^_0-9-]*)\s*-\s*(?P<series_index>[0-9]*(.?[0-9]*))?\s*-\s*(?P<title>[^_].+)

It insists on 4 parts (" - " between series and series_index) and will accept any numeric series number (even 1234.56789) But it MUST have three "-" preceeded by and followed by at least one space thus "A N Author - Whatever Series - 8.5 - Just Any Title.txt"

So strictly speaking, NO regex (I think) will do what was asked in the first place.

Quote:
Originally Posted by Shadewing View Post
The books are all in txt format and come with names in of 3 formats:

Author - Title.txt
Author - Series - Title.txt
Author - Series - Series No. - Title.txt
You have to choose ONE format and stick to it. You can "Add books" in one format then change the regex and "Add" more books in the second format

Both regexs will transfer the Author EXACTLY as input - LN, FN or FN LN or ????

I am SLOWLY beginning to understand regexs
louwin is offline   Reply With Quote
Old 04-04-2012, 02:51 AM   #10
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
Quote:
Originally Posted by louwin View Post
@speakingtohe I confirmed your regex but the series no CANNOT have any non numeric in it (cannot be 1.5 for instance) and cannot be separated from the series by a "-". If you aren't interested in something . something then your regex is okay.

JFI I mucked about with it and got this one:-

(?P<author>[^_-]+)\s*-\s*(?P<series>[^_0-9-]*)\s*-\s*(?P<series_index>[0-9]*(.?[0-9]*))?\s*-\s*(?P<title>[^_].+)

It insists on 4 parts (" - " between series and series_index) and will accept any numeric series number (even 1234.56789) But it MUST have three "-" preceeded by and followed by at least one space thus "A N Author - Whatever Series - 8.5 - Just Any Title.txt"

So strictly speaking, NO regex (I think) will do what was asked in the first place.



You have to choose ONE format and stick to it. You can "Add books" in one format then change the regex and "Add" more books in the second format

Both regexs will transfer the Author EXACTLY as input - LN, FN or FN LN or ????

I am SLOWLY beginning to understand regexs


First: I don't understand the regex and don't pretend to. Never impied it was mine. Was just a suggestion for a starting point and I was fine that you ignored it when mentioned first.

Second: your previous regex is selectable from the dropdown list so switching is not that hard but if you are manually renaming all of your books first shouldn't be an issue.

Third: Maybe 1.5 doesn't work, no idea. maybe 01.5 will work. Probably not. Kind if a special case in my experience as most series books I read are numbered in a straighforward manner and I adjust the others if I have to but usually I don't as I am not big into romances currently.

Glad you are getting the hang of regexes and wish that I would expend some effort to do the same. But feeling a bit annoyed at being browbeaten and belittled at the moment
speakingtohe is offline   Reply With Quote
Old 04-04-2012, 03:43 AM   #11
louwin
Newbie Nerd
louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.
 
louwin's Avatar
 
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
None of my statements were intended to "browbeat" nor "belittle".

All my statements were informational hence the JFYI and JFI

Yesterday I knew NOTHING about regexs, today I know a little.

My first "Add" got 40% error rate and my second .2%.

So much to learn, so little time. Certainly no time for browbeating or belittling

I am also renaming about 2000 files of differing formats.

I wish I knew about series a couple of days ago as I have done HOURS of work with the aim of manually editing the series information in.

I NOW know how to do it in the file name
louwin is offline   Reply With Quote
Old 04-04-2012, 04:00 AM   #12
louwin
Newbie Nerd
louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.
 
louwin's Avatar
 
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
Quote:
Originally Posted by Shadewing View Post
The books are all in txt format and come with names in of 3 formats:

Author - Title.txt
Author - Series - Title.txt
Author - Series - Series No. - Title.txt
Yeahhhhh!

This regex WILL do exactly what you want.

(?P<author>[^_-]+)\s*-\s*((?P<series>[^_0-9-]*)\s*-\s*)?((?P<series_index>[0-9]*(.?[0-9]*))?\s*-\s*)?(?P<title>[^_].+)

Author AND Title MUST be there!

Series and/or Series no may be there or not or one or the other.

The series no can have a decimal in it. I think 2.5 is a book later added between 2 and 3?

If only one of series or series no is present, I suspect Calibre knows which is which from the fact that the series no is numeric.

The author is transfered as entered but Calibre has an option to swap firstname and lastname.
louwin is offline   Reply With Quote
Old 04-04-2012, 04:52 AM   #13
louwin
Newbie Nerd
louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.
 
louwin's Avatar
 
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
Oops, please check before you commit. The regex seems to works sometimes, not all the time
louwin is offline   Reply With Quote
Old 04-04-2012, 05:31 AM   #14
louwin
Newbie Nerd
louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.louwin ought to be getting tired of karma fortunes by now.
 
louwin's Avatar
 
Posts: 114
Karma: 1000354
Join Date: Feb 2012
Location: Perth, Western Australia
Device: iPad 3 64Gb Black
Panic over!

Author - Title.txt works
Author - Series - Title.txt works
Author - Series - Series_Index - Title.txt works

Author - Series_Index - Title.txt also works but I don't know the effect of an index without an associated series on Calibre

Author - Series Series_Index - Title.txt DOES NOT WORK (no hyphen between Series and Series_Index). This puts the series and the series_index in with the title.

So this regex does ALL the Shadewing asked for

Phewww!
louwin is offline   Reply With Quote
Reply

Tags
adding ebooks, help please, regular expressions


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression - Adding metadata from filename LMF Calibre 1 03-20-2012 06:46 PM
Help in adding books using "regular expression" DM399 Calibre 2 07-08-2011 06:38 AM
Adding Books, regular expression smarties86 Calibre 4 12-19-2010 08:18 AM
Regular Expression on adding books. Lokro Calibre 4 11-06-2010 11:05 AM
Regular Expression For Adding Books jhart711 Calibre 3 09-27-2010 06:51 AM


All times are GMT -4. The time now is 11:53 PM.


MobileRead.com is a privately owned, operated and funded community.