![]() |
#1 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jul 2011
Device: none
|
Need a regex for importing books
Imported a bunch of books into Calibre the normal way. Calibre got the metadata from most book files okay (they're pdf files) but in many cases it pretty much fubar'd alot of files. My idea is to clear out all the fubar'd files from Calibre and re-import them using a regex.
Unfortunately I'm not e regex guy and I found no useful examples to help me out. I only got a little bit of the way in figuring out a regex. The file names are formatted as such. isbn.publisher.title.date.pdf Just to make things interesting all words are ended with a period. Publisher (always the same three words) and title (variable number of words) and date (month in 3 letters style then year in four digits). Examples: 012345678X.This.Is.Publisher.This.is.a.Title.Apr.2 007.pdf 876543210x.This.Is.Publisher.A.Different.Title.Tha t.is.Longer.Jan.1997.pdf This is the best regex I could get and it only gets isbn correct: (?P<isbn>[0-9]+[A-Za-z])\.(?P<publisher>[A-Za-z]+\.[A-Za-z]+\.[A-Za-z]+) When run on the second example: isbn = 876543210x publisher = This.Is.Publisher and for some reason title = 876543210x.This.Is.Publisher.A.Different.Title.Tha t.is.Longer.Jan.1997 I have no idea how to remove the periods from publisher. No idea how to get variable length titles. No idea how to get the dates. Anybody got good grep out there? |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Try this:
Code:
(?P<isbn>\d+\w)\.(?P<publisher>\w+\.\w+\.\w+)\.(?P<title>.*)\.(?P<published>\w\w\w\.\d+) Last edited by Starson17; 07-26-2011 at 04:29 PM. |
![]() |
![]() |
![]() |
#3 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jul 2011
Device: none
|
Thanks Starson17. Works as advertised. As you said id doesn't remove the periods on the title or publisher. Is that just not possible on file import? Is it possible on a bulk metadata search and replace?
My ultimate idea is t get all the info I can out of the file name and then let calibre do an internet lookup and hope it finds matches. Again, thanks for the regex. Going to try and figure it out. I'm pretty impressed you could do the varaible title lengths. Not to surprised about the periods though as that's less of a grep and more of an edit. Hopefully the bulk metadata search and replace can do it. |
![]() |
![]() |
![]() |
#4 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
You're welcome.
Quote:
Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jul 2011
Device: none
|
Starson17, just wanted to let you know that everything went very well. I'm truly amazed. Your regex worked flawlessly and correcting the period problem with the Publisher and Title fields in the bulk metadata search and replace was super easy. Then I just did a bulk internet metadata search and everything went perfect.
Just want to thank you one last time. I really appreciate your effort. |
![]() |
![]() |
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() Posts: 54
Karma: 10
Join Date: Jun 2009
Device: Nook, Kindle 3
|
Hmmm - I still miss a few steps here. I have a similar issue - my titles contain the ISBN number, followed by a period and then publisher (followed by a period - and btw in my case the publisher can have from one to four words), Title (again any combination of words, each word followed by a period) and Date (followed by a period).
All I'd like to do is to extract the ISBN number from the tile and copy it to the identifier/isbn field - so that I can then do an automatic bulk download of metadata and covers. Can somebody please explain how to do that? Here's what I tried: Edit Metadata|Search and replace; search mode=regex, search field=title, search for=??[sthg like your regex expression I presume]; replace with [how do I JUST get the ACTUAL isbn# here]; destination field identifier|isbn... Thanks a bunch in advance! -Stephan |
![]() |
![]() |
![]() |
#8 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
Provide examples of the titles/filenames/whatever you want to capture from.
Makes life a whole lot easier. Last edited by Serpentine; 10-26-2011 at 02:38 AM. Reason: oops |
![]() |
![]() |
![]() |
#9 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 127
Karma: 744
Join Date: Oct 2011
Device: Sony PRS-T1
|
Hi,
I also have some problems with a regex for importing books. My books look like this: "author - title.epub" "author - series xx - title.epub" That work fine IF author NOT looks like: "Brain, Master-Mind" The "-" within the author splits the "Master-Mind" and makes the "Mind..." to the series. BTW: I always use " - " as separator. Please help me. I tried it alone - but....no success.... |
![]() |
![]() |
![]() |
#10 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,914
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
or do you use <space>-<space> there is a difference (and it avoids hyphenated words ![]() |
|
![]() |
![]() |
![]() |
#11 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 127
Karma: 744
Join Date: Oct 2011
Device: Sony PRS-T1
|
I'm using <space>-<space>.
Last edited by salines; 10-27-2011 at 07:50 AM. |
![]() |
![]() |
![]() |
#12 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
It would help to have your base regex - it may have something special you use elsewhere.
If not, try this : Code:
(?P<author>.+?) - (?:(?P<series>.+?) (?:(?P<series_index>\d+(?:\.\d+)?) - )?)?(?P<title>.+) or with space as whitespace: (?P<author>.+?)\s-\s(?:(?P<series>.+?)\s(?:(?P<series_index>\d+(?:\.\d+)?)\s-\s)?)?(?P<title>.+) Last edited by Serpentine; 10-27-2011 at 06:26 PM. Reason: improved series index |
![]() |
![]() |
![]() |
#13 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 127
Karma: 744
Join Date: Oct 2011
Device: Sony PRS-T1
|
Quote:
Works fine. Thank you! ![]() But for "Fielding, Joy-ebbes - Tanz, Püppchen, tanz.pdf" it doesn't work. author is ok: "Fielding, Joy-ebbes" -> Series is here "Tanz," title is: "Püppchen, tanz! Other question: Should I switch the used regex if I add book for series and none series? How can I switch the used regex for adding books fast? |
|
![]() |
![]() |
![]() |
#14 |
Connoisseur
![]() Posts: 54
Karma: 10
Join Date: Jun 2009
Device: Nook, Kindle 3
|
Here are some examples:
01505798756X.Silly Press.The.Strange.Professional.Title.Jul.1985 043165591X.Wharton.School.Publishing.The.Delight.o f.Very.Silly.Titles.Hidden.Sep.2006 Thanks! |
![]() |
![]() |
![]() |
#15 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,914
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
No reasonable way to determine which words belong together with (spaces) and which are new fields.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex: File Renaming Pre-Import & Importing | penguinaka | Library Management | 20 | 08-14-2012 06:11 PM |
Importing RegEx Line | TheEldest | Calibre | 1 | 07-05-2011 10:18 PM |
understandng the sample add books regex | cybmole | Library Management | 11 | 03-02-2011 06:08 AM |
A little help adding books and using regex. | Dragonator | Calibre | 7 | 12-17-2010 06:57 PM |
regex Issue when Importing | river | Calibre | 3 | 06-16-2009 11:03 AM |