Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 08-04-2011, 07:00 AM   #1
tweebee
Enthusiast
tweebee doesn't littertweebee doesn't litter
 
tweebee's Avatar
 
Posts: 46
Karma: 182
Join Date: Aug 2011
Location: Boynton Beach, Florida
Device: Kindle Oasis 2, Kindle Paperwhite 3
Adding books - need help with regular expressions

Hello,

I'm new to calibre and I am looking to add my books into the library by parsing the filenames. They are in this format:

ISBN_Title_[Publisher].ext

I would like to parse the ISBN, Title, and Publisher into their appropriate fields while ignoring the underscores and brackets. I have read the manual and searched this forum and I still can't find any info on how to parse the underscores.

Thanks for your help!

Twee
tweebee is offline   Reply With Quote
Old 08-04-2011, 11:11 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by tweebee View Post
IThey are in this format:

ISBN_Title_[Publisher].ext

I would like to parse the ISBN, Title, and Publisher into their appropriate fields while ignoring the underscores and brackets.
Try this:
Code:
(?P<isbn>.*?)[_ ](?P<title>.*)[_ ]\[(?P<publisher>.*)\]
Underscores are treated specially and are converted to spaces, so they tend to disappear automatically. I made the regex a bit less dense so you can see what it's doing.
Starson17 is offline   Reply With Quote
Advert
Old 08-04-2011, 11:37 AM   #3
tweebee
Enthusiast
tweebee doesn't littertweebee doesn't litter
 
tweebee's Avatar
 
Posts: 46
Karma: 182
Join Date: Aug 2011
Location: Boynton Beach, Florida
Device: Kindle Oasis 2, Kindle Paperwhite 3
That works beautifully, thank you so much! I had the brackets parsed right, just didn't know how to do the underscores.

Thank you again!
tweebee is offline   Reply With Quote
Old 08-04-2011, 11:40 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by tweebee View Post
That works beautifully, thank you so much! I had the brackets parsed right, just didn't know how to do the underscores.

Thank you again!
You're welcome.
If you understand the regex, you'll note that you can get rid of these: "[_ ]" and replace with a simple space. I just put that in to be bit more clear. Also, note the non-greedy selector in the ISBN.
Starson17 is offline   Reply With Quote
Old 08-04-2011, 11:47 AM   #5
tweebee
Enthusiast
tweebee doesn't littertweebee doesn't litter
 
tweebee's Avatar
 
Posts: 46
Karma: 182
Join Date: Aug 2011
Location: Boynton Beach, Florida
Device: Kindle Oasis 2, Kindle Paperwhite 3
Quote:
Originally Posted by Starson17 View Post
You're welcome.
If you understand the regex, you'll note that you can get rid of these: "[_ ]" and replace with a simple space. I just put that in to be bit more clear. Also, note the non-greedy selector in the ISBN.
Oh I didn't even think of simply replacing those with the spaces! I was so hung up on putting something there to parse those underscores!

Thanks for that reminder. I'll know better next time.
tweebee is offline   Reply With Quote
Advert
Old 08-05-2011, 08:59 AM   #6
Daigomi
Junior Member
Daigomi began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Aug 2011
Device: Kindle
I didn't want to create a new thread, so I hope no-one minds me asking my question here. I've got a very similar situation to tweebee except my format is:

ISBN.Publisher.Title.Date.ext

The problem is that every space is replaced by a full stop, so in practice it looks something like:

0154879871.Bigbook.Publishers.Being.Truly.Happy.20 07.November.pdf

I understand that there is no way for the script to know when the publisher name ends and the book title begins, but I have my books sorted by publisher so I could change the script for each publisher. Basically, I am looking for a way for a way to sort the books as follows:

{isbn}.{Publisher}.{Still Publisher}.{Title}.{Ignore last two}

If someone could help me with a template for this, I'm sure I could figure out how to adapt it to other similar formats.

Last edited by Daigomi; 08-05-2011 at 09:06 AM.
Daigomi is offline   Reply With Quote
Old 08-05-2011, 09:18 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Daigomi View Post
I am looking for a way for a way to sort the books as follows:

{isbn}.{Publisher}.{Still Publisher}.{Title}.{Ignore last two}.pdf

If someone could help me with a template for this, I'm sure I could figure out how to adapt it to other similar formats.
Try this:
Code:
(?P<isbn>.*?)\.(?P<publisher>.*?\..*?)\.(?P<title>.*)\..*?\..*?
It parses this:
12345X.Publisher1.Publisher2.Title1.Title2.Title3. Ignore1.Ignore2.pdf

When requesting these, you really should give several examples of the actual file names, not just what you consider to be the pattern. It's important to see the spaces, hyphens, periods,, brackets, etc. in the names. For example, it's not clear if your actual files have the name Title, or {Title} in them. I assumed it didn't have the curly brackets, but if it did, other options would have been available.
Starson17 is offline   Reply With Quote
Old 08-05-2011, 11:13 AM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Daigomi;

0154879871.Bigbook.Publishers.Being.Truly.Happy.20 07.November.pdf

I understand that there is no way for the script to know when the publisher name ends and the book title begins, but I have my books sorted by publisher so I could change the script for each publisher.
slightly Side note:
An ISBN contains the Publisher (usually for the parent company) NUMBER

Language digit-Publisher-Book_Number-check_digit
unfortunately you (and many others) have stripped the hyphen
book_number+Publisher = 10 digits
small presses get long publisher numbers and short book numbers
a huge company get lots of book numbers.

Last edited by theducks; 08-07-2011 at 10:51 AM. Reason: fixe munged up partial quote
theducks is online now   Reply With Quote
Old 08-05-2011, 03:32 PM   #9
Daigomi
Junior Member
Daigomi began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Aug 2011
Device: Kindle
Quote:
Originally Posted by Starson17 View Post
Try this:
Code:
(?P<isbn>.*?)\.(?P<publisher>.*?\..*?)\.(?P<title>.*)\..*?\..*?
It parses this:
12345X.Publisher1.Publisher2.Title1.Title2.Title3. Ignore1.Ignore2.pdf

When requesting these, you really should give several examples of the actual file names, not just what you consider to be the pattern. It's important to see the spaces, hyphens, periods,, brackets, etc. in the names. For example, it's not clear if your actual files have the name Title, or {Title} in them. I assumed it didn't have the curly brackets, but if it did, other options would have been available.
Sorry, I gave a general example but I'll give a few actual examples as well:

0192840975.Oxford.University.Press.USA.Global.Warm ing.A.Very.Short.Introduction.Jan.2005.pdf
041528919X.Routledge.Roman.Berytus.Beirut.in.Late. Antiquity.Apr.2004.pdf
069100899X.Princeton.University.Press.Racism.A.Sho rt.History.May.2002.pdf

It seems to work almost perfectly thanks. The one slight annoyance is that the title and publisher has full stops in it (title: Racism.A.Short.History) but I can probably fix that with search and replace.

Thank you very much!
Daigomi is offline   Reply With Quote
Old 08-05-2011, 03:58 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Daigomi View Post
It seems to work almost perfectly thanks. The one slight annoyance is that the title and publisher has full stops in it (title: Racism.A.Short.History)
The input regex can't "change" any part of the filename during the Add Book operation. It can only select a part of the filename character string and insert that substring into the desired title, author or publisher field. Thus, there's no way to change the periods/dots/stops into spaces during import. As you have planned, you have to do that with Search and Replace in a second operation.
Starson17 is offline   Reply With Quote
Old 08-05-2011, 08:58 PM   #11
Daigomi
Junior Member
Daigomi began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Aug 2011
Device: Kindle
Quote:
Originally Posted by Starson17 View Post
The input regex can't "change" any part of the filename during the Add Book operation. It can only select a part of the filename character string and insert that substring into the desired title, author or publisher field. Thus, there's no way to change the periods/dots/stops into spaces during import. As you have planned, you have to do that with Search and Replace in a second operation.
Ahh, I didn't know that. Thanks!
Daigomi is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Another help with regular expressions encapuchado Library Management 6 06-21-2011 03:14 PM
Custom Regular Expressions for adding book information bigbot3 Calibre 1 12-25-2010 06:28 PM
Adding Books, regular expression smarties86 Calibre 4 12-19-2010 08:18 AM
Regular Expression on adding books. Lokro Calibre 4 11-06-2010 11:05 AM
Regular Expression For Adding Books jhart711 Calibre 3 09-27-2010 06:51 AM


All times are GMT -4. The time now is 09:39 AM.


MobileRead.com is a privately owned, operated and funded community.