View Single Post
Old 07-26-2011, 03:17 PM   #1
flinkdeldinky
Junior Member
flinkdeldinky began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2011
Device: none
Need a regex for importing books

Imported a bunch of books into Calibre the normal way. Calibre got the metadata from most book files okay (they're pdf files) but in many cases it pretty much fubar'd alot of files. My idea is to clear out all the fubar'd files from Calibre and re-import them using a regex.

Unfortunately I'm not e regex guy and I found no useful examples to help me out. I only got a little bit of the way in figuring out a regex.

The file names are formatted as such.

isbn.publisher.title.date.pdf

Just to make things interesting all words are ended with a period. Publisher (always the same three words) and title (variable number of words) and date (month in 3 letters style then year in four digits).

Examples:
012345678X.This.Is.Publisher.This.is.a.Title.Apr.2 007.pdf
876543210x.This.Is.Publisher.A.Different.Title.Tha t.is.Longer.Jan.1997.pdf

This is the best regex I could get and it only gets isbn correct:

(?P<isbn>[0-9]+[A-Za-z])\.(?P<publisher>[A-Za-z]+\.[A-Za-z]+\.[A-Za-z]+)

When run on the second example:

isbn = 876543210x
publisher = This.Is.Publisher
and for some reason
title = 876543210x.This.Is.Publisher.A.Different.Title.Tha t.is.Longer.Jan.1997

I have no idea how to remove the periods from publisher. No idea how to get variable length titles. No idea how to get the dates.

Anybody got good grep out there?
flinkdeldinky is offline   Reply With Quote