Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-05-2010, 02:40 PM   #1
Spiffy
Groupie
Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.
 
Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
Question Calibre book adding: Regular expression request...

Probably the five millionth since this forum was created, I suppose. :-)

Many of my files are kept in the following naming format:

L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz.lit

Please note those square brackets are virtually always used in what I've got stored, and the <space><single dash><space> between an author and series/series number and between that and the book title is also pretty consistent.

Assuming the vast majority of my books follow this format, does anyone have a good expression to add them with?

Ideally the expression would recognize the square brackets as a tip off that a book series and book number are being disclosed. Is such a thing even possible? I ask, because if a book ISN'T part of a series, the existing file name is probably something more like this:

H. G. Wells - The Time Machine.epub

OPTIONALLY, its at least possible (although I bet this is even harder to resolve) that some files may look like this:

Jules Verne - Journey to the Center of the Earth (html).zip

Pie in the sky, if those ROUND brackets could be a tip off to ignore something as NOT being part of a book title, that would be ideal. Yeah, even ignorant of how to build these expressions properly, I'm skeptical.

Does anyone out there have stuff following approximately these "rules", and what have you done to best ensure proper Calibre "importing"? ANY subset of the requirements I list above, dealing with series names in square brackets, ignoring stuff in round brackets, etc. would be better than nothing, but I don't expect much.

Please note that I have zero ability at scripting, so I'm really just asking what the best canned solution is. If its "you're out of luck" I guess I'll figure something else out. If there are proper expressions to handle this already, great. If there are other third party tools outside of Calibre to accurately mass rename files FIRST in an acceptably way, I suppose that's something I'd be willing to try as well (although using a 2nd tool first seems redundant if Calibre can be made to do it).

Thanks infinitely in advance for any possible suggestions!

Last edited by Spiffy; 04-05-2010 at 02:45 PM.
Spiffy is offline   Reply With Quote
Old 04-06-2010, 04:07 AM   #2
Dopedangel
Wizard
Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.
 
Dopedangel's Avatar
 
Posts: 1,758
Karma: 30063305
Join Date: Dec 2006
Location: Singapore
Device: Boyue
I had the same problem so I started using booksorter
to rename my files to

Author - Series # - Title.lit

http://iterati.org/ebookTools/BookSorter/Default.aspx

then used this for the add to

(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?

I did see somewhere on the forum the regex you are looking for but couldn't find it
Dopedangel is offline   Reply With Quote
Old 04-06-2010, 09:39 AM   #3
Spiffy
Groupie
Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.
 
Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
Quote:
Originally Posted by Dopedangel View Post
I had the same problem so I started using booksorter
to rename my files to

Author - Series # - Title.lit

http://iterati.org/ebookTools/BookSorter/Default.aspx

then used this for the add to

(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?

I did see somewhere on the forum the regex you are looking for but couldn't find it
Your standard naming includes no brackets, right?

Another way to do this DID occur to me this morning. Doing it in discrete steps.

First, importing the books without a series on their own, with a fairly standard regex.

Then going back and CHANGING the regex to expect a series and importing THOSE books.

But I guess I still would have to deal with the square brackets. I either have to have a way to mass remove them, or mass ignore them in an import.

The second problem, the format occasionally being at the end in round brackets, I suppose I just have to live with (and manually erase after the fact).
Spiffy is offline   Reply With Quote
Old 04-06-2010, 01:22 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Spiffy View Post
Your standard naming includes no brackets, right?
I started to respond to this twice, but I'm not at home and can't test anything I post. It's pretty easy to make the brackets optional if everything else is right.
this is an optional open bracket:

\[?

and this is an optional closed bracket:

\]?

Try this (totally untested):

Code:
wrong code posted (untested)
no, try this (tested):
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(\[?(?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\]?\s*-\s*)?(?P<title>.+)

Last edited by Starson17; 04-06-2010 at 06:24 PM.
Starson17 is offline   Reply With Quote
Old 04-06-2010, 06:24 PM   #5
Spiffy
Groupie
Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.
 
Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
Hmm. Good to know that.

The expression doesn't seem to work, unfortunately. But I appreciate the try.

When you use that string and run the test tool inside Calibre against this book:

L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz.lit

The following shows in the test results:

Title: L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz
Authors - nothing
Series - nothing
Series index - nothing
Spiffy is offline   Reply With Quote
Old 04-06-2010, 06:25 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Spiffy View Post
Hmm. Good to know that.

The expression doesn't seem to work, unfortunately.
Try again with edited code above.. I miscounted parentheses.
Starson17 is offline   Reply With Quote
Old 04-06-2010, 06:58 PM   #7
Spiffy
Groupie
Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.
 
Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
Genius work. Thank you--that's quite nifty. It even recognizes that if there's no brackets (or is it counting dashes?), there's no series, and realizes that the position of the title will be different.

I hate to push, but do you know a way to address the other main issue I had? The occasional optional file type sandwiched between ROUND brackets? Like so:

Jules Verne - Journey to the Center of the Earth (html).zip

Ideally, the best result would be to drop those file types, round brackets and everything between them, from the Title. Inevitably legit titles with round brackets could be affected, I guess, but that's a small price to pay.

I won't be greedy though. You've already saved me a ton of potential headache.
Spiffy is offline   Reply With Quote
Old 04-06-2010, 07:42 PM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Spiffy View Post
I hate to push
(staggering a bit) ..... try this:
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(\[?(?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\]?\s*-\s*)?(?P<title>[a-zA-Z1-9 ]+)(\(.*\))?$
It will have trouble with titles that have anything other than alphanumerics in the title.
Starson17 is offline   Reply With Quote
Old 04-06-2010, 08:40 PM   #9
Spiffy
Groupie
Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.
 
Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
Quote:
Originally Posted by Starson17 View Post
(staggering a bit) ..... try this:
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(\[?(?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\]?\s*-\s*)?(?P<title>[a-zA-Z1-9 ]+)(\(.*\))?$
It will have trouble with titles that have anything other than alphanumerics in the title.
Heh. Yeah, it was a fairly big ask, I know. Staggering indeed!

No dice though. It tosses everything into title field again.
Spiffy is offline   Reply With Quote
Old 04-06-2010, 09:15 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Spiffy View Post
It tosses everything into title field again.
It works for me. Try again, or show me what you're testing it on. It correctly parsed all of these:

L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz.lit
L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz(lit).lit
L. Frank Baum - Wizard of Oz 02 - The Marvelous Land of Oz(lit).lit
L. Frank Baum - Wizard of Oz 02 - The Marvelous Land of Oz.lit

Last edited by Starson17; 04-06-2010 at 09:17 PM.
Starson17 is offline   Reply With Quote
Old 04-06-2010, 10:19 PM   #11
Spiffy
Groupie
Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.
 
Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
Quote:
Originally Posted by Starson17 View Post
It works for me. Try again, or show me what you're testing it on. It correctly parsed all of these:

L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz.lit
L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz(lit).lit
L. Frank Baum - Wizard of Oz 02 - The Marvelous Land of Oz(lit).lit
L. Frank Baum - Wizard of Oz 02 - The Marvelous Land of Oz.lit
You are correct! Maybe I chopped off a character when I did the copy and paste last time.
Spiffy is offline   Reply With Quote
Old 04-09-2010, 06:54 PM   #12
tinear
Junior Member
tinear began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Apr 2010
Device: none
Actually, there are a few instances that it doesn't work to kill version numbers and formats after the title (although it is beautifully written).
Replacing (\(.*\))?$ with .+ seems to drop everything after the title.
tinear is offline   Reply With Quote
Old 04-11-2010, 07:00 PM   #13
Spiffy
Groupie
Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.
 
Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
Quote:
Originally Posted by Starson17 View Post
(staggering a bit) ..... try this:
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(\[?(?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\]?\s*-\s*)?(?P<title>[a-zA-Z1-9 ]+)(\(.*\))?$
It will have trouble with titles that have anything other than alphanumerics in the title.
Actually, I think I figured out one of the issues with this.

The regex works perfectly with any of this:

Quote:
L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz.lit
L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz(lit).lit
L. Frank Baum - Wizard of Oz 02 - The Marvelous Land of Oz(lit).lit
L. Frank Baum - Wizard of Oz 02 - The Marvelous Land of Oz.lit
But if somehow THIS makes it's way into the parsing, disaster results:

Quote:
L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz[lit].lit
I think I had some unexpected files which were a bit messed up by using square brackets around the file type near the end rather than round ones. I probably just have to track those down "by hand".
Spiffy is offline   Reply With Quote
Old 04-11-2010, 07:05 PM   #14
Spiffy
Groupie
Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.Spiffy has a complete set of Star Wars action figures.
 
Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
Quote:
Originally Posted by tinear View Post
Actually, there are a few instances that it doesn't work to kill version numbers and formats after the title (although it is beautifully written).
Replacing (\(.*\))?$ with .+ seems to drop everything after the title.
In other words?
Quote:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(\[?(?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\]?\s*-\s*)?(?P<title>[a-zA-Z1-9 ]+).+
Yes. I THINK that works. What I've tested so far (including "L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz[lit].lit ") has no problem.
Spiffy is offline   Reply With Quote
Old 04-13-2010, 05:24 PM   #15
MSWallack
Right, Except When Wrong
MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.
 
MSWallack's Avatar
 
Posts: 351
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
This is so close to what I'm trying to accomplish that I thought it was worth posting my query, too. In my case, book titles are formatted like this:

Brown, Dan - The Lost Symbol [Robert Langdon #3].epub

AuthorLast, AuthorFirst - Title [Series #SeriesNum].format

It looks like the code that was provided is very close, but I'm not quite sure where the "delimeters" (not sure of the right term) are between the Author, Series, and Title sections of the RE.

Thanks for any help you can provide.
MSWallack is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Help Azhad Calibre 86 09-27-2011 02:37 PM
Custom Regular Expressions for adding book information bigbot3 Calibre 1 12-25-2010 06:28 PM
Regular Expression Help smartmart Calibre 5 10-17-2010 05:19 AM
Regular Expression For Adding Books jhart711 Calibre 3 09-27-2010 06:51 AM
Help with the regular expression Dysonco Calibre 9 03-22-2010 10:45 PM


All times are GMT -4. The time now is 12:01 AM.


MobileRead.com is a privately owned, operated and funded community.