![]() |
#1 |
Enthusiast
![]() ![]() Posts: 46
Karma: 182
Join Date: Aug 2011
Location: Boynton Beach, Florida
Device: Kindle Oasis 2, Kindle Paperwhite 3
|
Adding books - need help with regular expressions
Hello,
I'm new to calibre and I am looking to add my books into the library by parsing the filenames. They are in this format: ISBN_Title_[Publisher].ext I would like to parse the ISBN, Title, and Publisher into their appropriate fields while ignoring the underscores and brackets. I have read the manual and searched this forum and I still can't find any info on how to parse the underscores. Thanks for your help! Twee |
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
(?P<isbn>.*?)[_ ](?P<title>.*)[_ ]\[(?P<publisher>.*)\] |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Enthusiast
![]() ![]() Posts: 46
Karma: 182
Join Date: Aug 2011
Location: Boynton Beach, Florida
Device: Kindle Oasis 2, Kindle Paperwhite 3
|
That works beautifully, thank you so much! I had the brackets parsed right, just didn't know how to do the underscores.
Thank you again! ![]() |
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
If you understand the regex, you'll note that you can get rid of these: "[_ ]" and replace with a simple space. I just put that in to be bit more clear. Also, note the non-greedy selector in the ISBN. |
|
![]() |
![]() |
![]() |
#5 | |
Enthusiast
![]() ![]() Posts: 46
Karma: 182
Join Date: Aug 2011
Location: Boynton Beach, Florida
Device: Kindle Oasis 2, Kindle Paperwhite 3
|
Quote:
![]() Thanks for that reminder. I'll know better next time. ![]() |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
I didn't want to create a new thread, so I hope no-one minds me asking my question here. I've got a very similar situation to tweebee except my format is:
ISBN.Publisher.Title.Date.ext The problem is that every space is replaced by a full stop, so in practice it looks something like: 0154879871.Bigbook.Publishers.Being.Truly.Happy.20 07.November.pdf I understand that there is no way for the script to know when the publisher name ends and the book title begins, but I have my books sorted by publisher so I could change the script for each publisher. Basically, I am looking for a way for a way to sort the books as follows: {isbn}.{Publisher}.{Still Publisher}.{Title}.{Ignore last two} If someone could help me with a template for this, I'm sure I could figure out how to adapt it to other similar formats. Last edited by Daigomi; 08-05-2011 at 09:06 AM. |
![]() |
![]() |
![]() |
#7 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
(?P<isbn>.*?)\.(?P<publisher>.*?\..*?)\.(?P<title>.*)\..*?\..*? 12345X.Publisher1.Publisher2.Title1.Title2.Title3. Ignore1.Ignore2.pdf When requesting these, you really should give several examples of the actual file names, not just what you consider to be the pattern. It's important to see the spaces, hyphens, periods,, brackets, etc. in the names. For example, it's not clear if your actual files have the name Title, or {Title} in them. I assumed it didn't have the curly brackets, but if it did, other options would have been available. |
|
![]() |
![]() |
![]() |
#8 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,903
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
![]() An ISBN contains the Publisher (usually for the parent company) NUMBER Language digit-Publisher-Book_Number-check_digit unfortunately you (and many others) have stripped the hyphen book_number+Publisher = 10 digits small presses get long publisher numbers and short book numbers a huge company get lots of book numbers. Last edited by theducks; 08-07-2011 at 10:51 AM. Reason: fixe munged up partial quote |
|
![]() |
![]() |
![]() |
#9 | |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
Quote:
0192840975.Oxford.University.Press.USA.Global.Warm ing.A.Very.Short.Introduction.Jan.2005.pdf 041528919X.Routledge.Roman.Berytus.Beirut.in.Late. Antiquity.Apr.2004.pdf 069100899X.Princeton.University.Press.Racism.A.Sho rt.History.May.2002.pdf It seems to work almost perfectly thanks. The one slight annoyance is that the title and publisher has full stops in it (title: Racism.A.Short.History) but I can probably fix that with search and replace. Thank you very much! |
|
![]() |
![]() |
![]() |
#10 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
The input regex can't "change" any part of the filename during the Add Book operation. It can only select a part of the filename character string and insert that substring into the desired title, author or publisher field. Thus, there's no way to change the periods/dots/stops into spaces during import. As you have planned, you have to do that with Search and Replace in a second operation.
|
![]() |
![]() |
![]() |
#11 | |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
Quote:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Another help with regular expressions | encapuchado | Library Management | 6 | 06-21-2011 03:14 PM |
Custom Regular Expressions for adding book information | bigbot3 | Calibre | 1 | 12-25-2010 06:28 PM |
Adding Books, regular expression | smarties86 | Calibre | 4 | 12-19-2010 08:18 AM |
Regular Expression on adding books. | Lokro | Calibre | 4 | 11-06-2010 11:05 AM |
Regular Expression For Adding Books | jhart711 | Calibre | 3 | 09-27-2010 06:51 AM |