Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-14-2010, 10:18 AM   #16
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by MSWallack View Post
Brown, Dan - The Lost Symbol [Robert Langdon #3].epub

AuthorLast, AuthorFirst - Title [Series #SeriesNum].format
Try this:
Code:
^(?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b)(\s*-\s*)?(?P<title>([^\-_\[\(]+)) (\[(?P<series>[^0-9\-]+) (- )?\#?(?P<series_index>[0-9.]+)\])
Starson17 is offline   Reply With Quote
Old 04-14-2010, 11:01 AM   #17
MSWallack
Right, Except When Wrong
MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.
 
MSWallack's Avatar
 
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
Brilliant! Perfect! Bravo!

Seriously, thank you very much. This worked perfectly. Now if I can just try to figure out what you've done so that I can try to edit my own expressions... I looked at the page that Calibre links to but it wasn't terribly comprehensible. Do you know of an easier, more straightforward guide to regular expressions?
MSWallack is offline   Reply With Quote
Old 04-14-2010, 11:46 AM   #18
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by MSWallack View Post
Brilliant! Perfect! Bravo!

Seriously, thank you very much. This worked perfectly. Now if I can just try to figure out what you've done so that I can try to edit my own expressions... I looked at the page that Calibre links to but it wasn't terribly comprehensible. Do you know of an easier, more straightforward guide to regular expressions?
That one is a bit intimidating. It's got a lot of options that you don't need if your books are named the way you said. For example, I made the # optional in front of your series number.

There are too many regex guides for me to point to any single one. Google is your friend, but basically, here's what I did:

This part says stop matching the title when you find a closed square bracket or open parentheses or hyphen or underscore.

(?P<title>([^\-_\[\(]+))

Then there's a space and a required open square bracket before this:
(?P<series>[^0-9\-]+)
which is the series. The series matches until it sees a number or hyphen.

A lot of the backslashes are "escaping" characters like parentheses or hyphens or brackets. It takes some practice to read one you didn't write yourself. Sometimes it's easier to start with your own.

This part is an optional hyphen and space so you can optionally have space hyphen space in front of your series_id
(- )?

This part is the optional pound sign (looking closely, I see that I escaped it, but that was unnecessary, I was in a hurry).
\#?
Starson17 is offline   Reply With Quote
Old 04-16-2010, 05:16 PM   #19
MSWallack
Right, Except When Wrong
MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.
 
MSWallack's Avatar
 
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
Just to make this more complicated than it already is... Some of my books aren't part of a series. Thus, for those books, there isn't a bracket with a series and number at the end of the name. When Calibre adds those books, the whole file name is added as the book name, the author is set as Unknown and nothing goes into the series name. What do I need to add (and where) to the RE to say that if the file extension is encountered before a an open bracket, then process as if there isn't a series?
MSWallack is offline   Reply With Quote
Old 04-16-2010, 05:45 PM   #20
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by MSWallack View Post
Just to make this more complicated than it already is... Some of my books aren't part of a series. Thus, for those books, there isn't a bracket with a series and number at the end of the name. When Calibre adds those books, the whole file name is added as the book name, the author is set as Unknown and nothing goes into the series name. What do I need to add (and where) to the RE to say that if the file extension is encountered before a an open bracket, then process as if there isn't a series?
Post an example filename. The simplest is to add a parenthetical around your optional part and stick a "?" after it (no quotes).
Starson17 is offline   Reply With Quote
Old 04-19-2010, 10:06 AM   #21
MSWallack
Right, Except When Wrong
MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.
 
MSWallack's Avatar
 
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
I've endeavored to follow a consistent format for naming books. So a book without a series would look like this:

AuthorLast, AuthorFirst - Title.extension

while a book with a series would like like this:

AuthorLast, AuthorFirst - Title [Series #Num].extension

One other quirk that I've noticed is that the RE seems to return unpredictable results if there is a dash or hyphen in the author's name or the book title.
MSWallack is offline   Reply With Quote
Old 04-19-2010, 10:40 AM   #22
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by MSWallack View Post
One other quirk that I've noticed is that the RE seems to return unpredictable results if there is a dash or hyphen in the author's name or the book title.
spaces and hyphens are used to "find" the breaks between author, title, series and number. The most common separator is "space-hyphen-space." If your author or title has that in it, it will usually break there. OTOH, if it has only a hyphen, but no spaces on either side, it won't break (for many regexes) It all depends on the regex and your specific title. I've seen many regexes that specify titles never have a hyphen. See this:

Code:
(?P<title>([^\-_\[\(]+))
That means the title never has a hyphen. OTOH, this:
Code:
(?P<title>([^_\[\(]+))
permits the hyphen in the title, but that may break other parts of the regex match.
Starson17 is offline   Reply With Quote
Old 04-19-2010, 10:59 AM   #23
MSWallack
Right, Except When Wrong
MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.
 
MSWallack's Avatar
 
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
Interesting. OK. So what do I add (and where do I add it) to note that the filename may or may not include a series? There won't be any brackets unless they are used to designate a series.

Thanks again, for all the help.
MSWallack is offline   Reply With Quote
Old 04-19-2010, 11:25 AM   #24
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by MSWallack View Post
Interesting. OK. So what do I add (and where do I add it) to note that the filename may or may not include a series? There won't be any brackets unless they are used to designate a series.
I think I already answered that:
Quote:
Post an example filename. The simplest is to add a parenthetical around your optional part and stick a "?" after it (no quotes).
That's the short answer. The longer answer is that this will often break something. A quick example is that suppose you sometimes don't separate author and title with space-hyphen-space. You can't just make it optional or it will be unable to find the split between author and title. In your case, use the brackets to find the series, then make it optional. Beware that you said you sometimes have brackets around the filetype after the title, which might screw things up.

There is no "magic" regex that fits everyone/everything.

BTW, brackets are special, so you need to "escape" them with a backslash. This is a bracket: "\["

This means match the letter "a" or "b": "[ab]

This means match an open bracket or the letter "a": "[\[a]

Last edited by Starson17; 04-19-2010 at 11:30 AM.
Starson17 is offline   Reply With Quote
Old 04-19-2010, 12:15 PM   #25
MSWallack
Right, Except When Wrong
MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.
 
MSWallack's Avatar
 
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
Thanks again. I'll play around with that. Note that I don't have brackets around the filetype. I only use brackets for series information; I use parenthesis for other title-related information. I'm pretty careful about using space - space between author and title, but lots of my titles (and some authors) have hyphens in them. I've thought about replacing the space - space with space ~ space. I wonder if I might have more success with that...
MSWallack is offline   Reply With Quote
Old 04-19-2010, 01:19 PM   #26
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by MSWallack View Post
Thanks again. I'll play around with that. Note that I don't have brackets around the filetype. I only use brackets for series information; I use parenthesis for other title-related information. I'm pretty careful about using space - space between author and title, but lots of my titles (and some authors) have hyphens in them. I've thought about replacing the space - space with space ~ space. I wonder if I might have more success with that...
I usually just make sure I don't have space - space between anything other than author, title or series. It's OK to have a hyphen in a title, just not spaces on either side. Then you can find the break. Sometimes I substitute a colon in the title.

I see it was Spiffy who occasionally had brackets around the filetype when he wrote:
Quote:
If somehow THIS makes it's way into the parsing, disaster results:
L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz[lit].lit
Starson17 is offline   Reply With Quote
Old 04-21-2010, 02:03 PM   #27
MSWallack
Right, Except When Wrong
MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.
 
MSWallack's Avatar
 
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
I thought that I was making good progress, but now I'm stumped. Some of my files don't have a series. For example:

King, Stephen - Under the Dome.epub

Here is the RE:

^(?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b)(\s*-\s*)?(?P<title>([^\-_\[\(]+)) ((\[(?P<series>[^0-9\-]+) (- )?\#?(?P<series_index>[0-9.]+)\]))?

When I test that RE on that filename in Calibre, I get the following results:

Title: Under the
Author: _Stephen King [I've used an underscore to indicate a space that Calibre is putting bofore the author's name]

I can't quite figure out where the RE is breaking.

Thanks again for the help.
MSWallack is offline   Reply With Quote
Old 04-21-2010, 02:48 PM   #28
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by MSWallack View Post
I thought that I was making good progress, but now I'm stumped. Some of my files don't have a series. For example:

King, Stephen - Under the Dome.epub

Here is the RE:

^(?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b)(\s*-\s*)?(?P<title>([^\-_\[\(]+)) ((\[(?P<series>[^0-9\-]+) (- )?\#?(?P<series_index>[0-9.]+)\]))?

When I test that RE on that filename in Calibre, I get the following results:

Title: Under the
Author: _Stephen King [I've used an underscore to indicate a space that Calibre is putting bofore the author's name]

I can't quite figure out where the RE is breaking.

Thanks again for the help.
You have a space between title and series. That requires a space after the title, but before the optional series and series_index. Therefore, the last word in your title cannot be part of the title, since the last word is not followed by a space, and your regex requires a space after the last word in the title. The last word can't be the series either, since the series is required to have a series_index too.

Try this:
Code:
^(?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b)(\s*-\s*)?(?P<title>([^\-_\[\(]+))((\[(?P<series>[^0-9\-]+) (- )?\#?(?P<series_index>[0-9.]+)\]))?
Edit: IIRC, the space preceding the author is not a problem. Calibre aggressively strips leading and trailing spaces where they might cause trouble.
Starson17 is offline   Reply With Quote
Old 04-21-2010, 05:57 PM   #29
MSWallack
Right, Except When Wrong
MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.MSWallack ought to be getting tired of karma fortunes by now.
 
MSWallack's Avatar
 
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
Thanks again. That RE works.
MSWallack is offline   Reply With Quote
Old 09-12-2011, 10:59 AM   #30
joelgilb
Junior Member
joelgilb began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Sep 2011
Device: Sony PRS300
Quote:
Originally Posted by Spiffy View Post
In other words?

Yes. I THINK that works. What I've tested so far (including "L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz[lit].lit ") has no problem.
Was hoping to revive this thread to ask how to add one item to this expression (from this thread, above)

Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(\[?(?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\]?\s*-\s*)?(?P<title>[a-zA-Z1-9 ]+)
Currently my books have a few different naming conventions that the above handles the majority of.

My books tend to be names as follows:

AuthorLN, AuthorFN - Series # - Title - ISBN (lit).lit

the (lit) is not always in the filename and not all books have the ISBN in the name.

When I try to add coding to pickup the ISBN it always messes up imports when the ISBN doesn't exist. Until I saw this thread I have been using this expression and would manually remove the ISBN portion when my filename did not include the ISBN:

Code:
(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?-(?P<isbn>.*)
But when I am importing books without ISBN's I get:

Title = Series #
Authors = authors
Series = No Match
Series Index = No Match
ISBN = Title


Any help would be appreciated.

Last edited by joelgilb; 09-12-2011 at 11:10 AM.
joelgilb is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Help Azhad Calibre 86 09-27-2011 02:37 PM
Custom Regular Expressions for adding book information bigbot3 Calibre 1 12-25-2010 06:28 PM
Regular Expression Help smartmart Calibre 5 10-17-2010 05:19 AM
Regular Expression For Adding Books jhart711 Calibre 3 09-27-2010 06:51 AM
Help with the regular expression Dysonco Calibre 9 03-22-2010 10:45 PM


All times are GMT -4. The time now is 02:25 PM.


MobileRead.com is a privately owned, operated and funded community.