Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-28-2010, 11:25 AM   #1
Dragonator
Junior Member
Dragonator began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2010
Device: none
A little help adding books and using regex.

Hello.

I love Calibre and use it to organize my ebook library. It's awesome.

I do have one issue though: the regular expression that deals with the name of the file being added. I'm sure this is a common issue with a simple solution that would make me feel like an idiot for not figuring it out. The thing is I have never gotten along too well with regular expressions so I need a little help. For the record I have looked up regular expressions as used by calibre and tried to figure them out, then tried to look for a solution. I really hope some helpful soul will lend me a hand.

The files I add are usually named in two ways depending on if they are part of a series or not:

author_name - series_name series_index - book_title.extension

or

author_name - book_title.extension.

The regular expression I use is this:

(?P<author>[^_]+) - (?P<series>.+) (?P<series_index>.+) - (?P<title>.+)

which works well when adding series but not when adding individual books, which is when I edit the prefferences and delete the middle part resulting in this:

(?P<author>[^_]+) - (?P<title>.+)

I kept trying to make the deleted part optional so it would work in both cases automatically but I just can't get my head around it. Is it possible to come up with a regular expression that deals with both cases favorably?

Thank you.
Dragonator is offline   Reply With Quote
Old 11-28-2010, 11:33 AM   #2
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
There's a bunch of threads on this around the forums.

The one I use is this:
Code:
^(?P<author>((?!\s-\s).)+)\s-\s(?:(?:\[\s*)?(?P<series>.+)\s(?P<series_index>[\d\.]+)(?:\s*\])?\s-\s)?(?P<title>[^(]+)(?:\(.*\))?
kiwidude is offline   Reply With Quote
Old 11-28-2010, 11:37 AM   #3
Dragonator
Junior Member
Dragonator began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2010
Device: none
Thank you. I was certain there would be some threads on this but I just couldn't seem to be able to find them. Perhaps I should have tried harder.

Anyway, thank you very much. It works perfectly.
Dragonator is offline   Reply With Quote
Old 12-15-2010, 09:43 AM   #4
lxrcab
Junior Member
lxrcab began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2010
Device: Palm Treo Pro (WM 6.1) and LG Optimus Smartphone (Android)
Quote:
Originally Posted by kiwidude View Post
There's a bunch of threads on this around the forums.

The one I use is this:
Code:
^(?P<author>((?!\s-\s).)+)\s-\s(?:(?:\[\s*)?(?P<series>.+)\s(?P<series_index>[\d\.]+)(?:\s*\])?\s-\s)?(?P<title>[^(]+)(?:\(.*\))?
@kiwidude
Would you please give me a plaintext example of a complete "author series seriesindex title" that your regex digests with whatever punctuation is used?

I can't read the regex and figure out what's code and punctuation from the expression and what is data expected to be in the filename. I'm a new user that needs to edit a bunch of filenames and want to make sure they are correct.

Thanks!
lxrcab is offline   Reply With Quote
Old 12-15-2010, 10:22 AM   #5
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by lxrcab View Post
I can't read the regex and figure out what's code and punctuation from the expression and what is data expected to be in the filename. I'm a new user that needs to edit a bunch of filenames and want to make sure they are correct.
The easiest way would be adapting the regex to the filename, not vice versa.
Manichean is offline   Reply With Quote
Old 12-15-2010, 10:37 AM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Manichean View Post
The easiest way would be adapting the regex to the filename, not vice versa.
The regex posted by kiwidude is similar (identical?) to one that's been kicking around for ages. It's lengthy and complex because it has lots of lookahead and options to parse a wide variety of filenames. Basically, it will handle filenames that have metadata in the order: author, series, series_index, title where they are separated in any one of a wide variety of ways or where some parts are missing. Occasionally someone will get energetic and encrust the regex with some more options, which makes it flexible but dense to read.

Most people keep a couple of fancy regex expressions around for different order filenames, such as, title first, title last, etc.

I'd suggest testing your filenames against some of the available regex expressions. If none work, then I agree with Manichean that modifying the regex is the next step, but I wouldn't start from the complex encrusted regexes you find, as they can be intimidating if you haven't gotten a handle on the basics.

If he asks for help, he should post some sample filenames he has, then I'm sure that help will quickly be offered.
Starson17 is offline   Reply With Quote
Old 12-17-2010, 06:27 PM   #7
lxrcab
Junior Member
lxrcab began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2010
Device: Palm Treo Pro (WM 6.1) and LG Optimus Smartphone (Android)
Thanks all, I ran a few experiments and the regex works nicely.
lxrcab is offline   Reply With Quote
Old 12-17-2010, 07:57 PM   #8
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Quote:
Originally Posted by lxrcab View Post
@kiwidude
Would you please give me a plaintext example of a complete "author series seriesindex title" that your regex digests with whatever punctuation is used?
Others have made good suggestions here and sounds like you have been sorted but I will partially answer your original question.

I posted that regex because it would work for the OP's named request. Specifically it is designed to handle:
Author - Series # - Title
Author - Title

In my case it would handle filenames like these:
Bloggs, Joe - My title
Bloggs, Joe - Some Series 1 - My title
Bloggs, Joe - Some Series 1.5 - My title
Bloggs, Joe - Some Series 1.5 - My title with sub-title hyphen

One further comment. I actually find it faster to change the filenames into a standard format, than it is to switch regexes, particularly if you have a bunch of books with randomly formatted names. I slice and dice the filenames of the files to match my regex and then do a bulk import. I found it way too fiddly to keep changing the regex in Calibre, particularly as it keeps no history so you have to store them all externally and keep pasting them in before an add operation.
kiwidude is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Specify Tags when Adding Books barium Calibre 3 09-13-2010 08:11 PM
Adding books to my EZReader readinggal Astak EZReader 26 02-19-2010 09:57 PM
Regex search author field to locate books? Starson17 Calibre 2 12-21-2009 11:40 AM
Adding books without a copy Sanderfox Calibre 5 12-02-2009 03:32 PM
Help with adding books please stustaff Calibre 12 10-27-2009 03:30 PM


All times are GMT -4. The time now is 05:54 PM.


MobileRead.com is a privately owned, operated and funded community.