Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-29-2010, 07:20 PM   #1
Dysonco
Junior Member
Dysonco began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2010
Device: Sony PRS505
Help with the regular expression

Hi All,

New member of the forum, so hello everyone! I thought I'd kick off with a conundrum that I'm having trouble solving.

I'm a great fan of Calibre, its a great bit of software, I'm just having a little trouble configuring it to correctly identify the information from the filenames in my book collection.

All my books have their filenames in this format:

AuthorLastname, AuthorFirstnames - BookSeries SeriesNumber - BookTitle.FileExtension

I've tweaked the regular expression (mostly by trial and error as I'm most definately not a programmer) to this:

(?P<author>[^_]+) - (?P<series>[^_]+) - (?P<title>[^_]+)

Now this works fine on the filename example as above, but unfortunately fails when the BookSeries and SeriesNumber parts are missing (when its a single book and not a part of a series).

So for example:

Pratchett, Terry - Discworld 01 - The Colour Of Magic.pdf

Would work okay, but:

Pratchett, Terry - Strata.pdf

Wouldn't. I almost need a way to set the expression to realise that if theres only two groups to recognise that it is Author and Title and ignore the series bit.

Any suggestions from the gurus?

Many thanks,

Mike
Dysonco is offline   Reply With Quote
Old 01-30-2010, 09:34 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Dysonco View Post
AuthorLastname, AuthorFirstnames - BookSeries SeriesNumber - BookTitle.FileExtension

I've tweaked the regular expression (mostly by trial and error as I'm most definately not a programmer) to this:

(?P<author>[^_]+) - (?P<series>[^_]+) - (?P<title>[^_]+)

Now this works fine on the filename example as above, but unfortunately fails when the BookSeries and SeriesNumber parts are missing (when its a single book and not a part of a series).

So for example:

Pratchett, Terry - Discworld 01 - The Colour Of Magic.pdf

Would work okay, but:

Pratchett, Terry - Strata.pdf

Wouldn't.
This is what I'm using now:
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>.+)
It will parse your two examples and many others. Yours doesn't get the series_index. There are several regex threads here if you want more options.
Starson17 is offline   Reply With Quote
Advert
Old 01-30-2010, 07:46 PM   #3
Dysonco
Junior Member
Dysonco began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2010
Device: Sony PRS505
Hi Starson17,

Wow that works great!

Many thanks,

Mike
Dysonco is offline   Reply With Quote
Old 02-22-2010, 10:39 AM   #4
Ozzy
Junior Member
Ozzy began at the beginning.
 
Ozzy's Avatar
 
Posts: 2
Karma: 10
Join Date: Feb 2010
Device: PC
@ Starson17 -- That's a great expression, exactly what I was looking for. Thnx.

Quote:
Originally Posted by Starson17 View Post
This is what I'm using now:
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>.+)
It will parse your two examples and many others. Yours doesn't get the series_index. There are several regex threads here if you want more options.
Ozzy is offline   Reply With Quote
Old 02-23-2010, 05:39 AM   #5
kapaka
Member
kapaka began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Dec 2008
Device: BQ
Quote:
Originally Posted by Starson17 View Post
This is what I'm using now:
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>.+)
It will parse your two examples and many others. Yours doesn't get the series_index. There are several regex threads here if you want more options.
Thank You So Much!!!
I have been trying to figure out regular expression for just this, was about to quit and edit manually
kapaka is offline   Reply With Quote
Advert
Old 03-20-2010, 01:57 AM   #6
qlfwyyd
Junior Member
qlfwyyd began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Apr 2009
Device: iPhone
This is awesome! Thanks so much.
qlfwyyd is offline   Reply With Quote
Old 03-22-2010, 07:28 AM   #7
qlfwyyd
Junior Member
qlfwyyd began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Apr 2009
Device: iPhone
Actually - any chance this could be adapted so that it would still work if the series index is not included? At present, if there is no index, the series name is added to the title.
qlfwyyd is offline   Reply With Quote
Old 03-22-2010, 07:41 AM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by qlfwyyd View Post
Actually - any chance this could be adapted so that it would still work if the series index is not included? At present, if there is no index, the series name is added to the title.
Yes, but that breaks other things. The numbers of the series_index are used to help identify the series name.
Starson17 is offline   Reply With Quote
Old 03-22-2010, 07:57 PM   #9
qlfwyyd
Junior Member
qlfwyyd began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Apr 2009
Device: iPhone
Ah well, it's pretty damn good anyway. I'll just have to add in the series numbers by hand first. Thanks again!
qlfwyyd is offline   Reply With Quote
Old 03-22-2010, 10:45 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by qlfwyyd View Post
Ah well, it's pretty damn good anyway. I'll just have to add in the series numbers by hand first. Thanks again!
Or you can change the regex when you have books to add that don't have the series number, but have some other way to identify the series. If the series name is always in brackets, or always is after the first space-hyphen-space, then you can find it that way. I haven't found any single regex that will do all the different file naming schemes I've seen.
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Help Azhad Calibre 86 09-27-2011 02:37 PM
Regular Expression Help smartmart Calibre 5 10-17-2010 05:19 AM
Need Help Creating a Regular Expression Worm Calibre 9 08-18-2010 01:20 PM
Regular Expression Help Needed dloyer4 Calibre 1 07-25-2010 10:37 PM
I don't know how to use wilcards and regular expression.... superanima Sigil 4 02-21-2010 09:42 AM


All times are GMT -4. The time now is 02:03 PM.


MobileRead.com is a privately owned, operated and funded community.