04-14-2010, 10:18 AM | #16 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
^(?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b)(\s*-\s*)?(?P<title>([^\-_\[\(]+)) (\[(?P<series>[^0-9\-]+) (- )?\#?(?P<series_index>[0-9.]+)\]) |
|
04-14-2010, 11:01 AM | #17 |
Right, Except When Wrong
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
|
Brilliant! Perfect! Bravo!
Seriously, thank you very much. This worked perfectly. Now if I can just try to figure out what you've done so that I can try to edit my own expressions... I looked at the page that Calibre links to but it wasn't terribly comprehensible. Do you know of an easier, more straightforward guide to regular expressions? |
Advert | |
|
04-14-2010, 11:46 AM | #18 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
There are too many regex guides for me to point to any single one. Google is your friend, but basically, here's what I did: This part says stop matching the title when you find a closed square bracket or open parentheses or hyphen or underscore. (?P<title>([^\-_\[\(]+)) Then there's a space and a required open square bracket before this: (?P<series>[^0-9\-]+) which is the series. The series matches until it sees a number or hyphen. A lot of the backslashes are "escaping" characters like parentheses or hyphens or brackets. It takes some practice to read one you didn't write yourself. Sometimes it's easier to start with your own. This part is an optional hyphen and space so you can optionally have space hyphen space in front of your series_id (- )? This part is the optional pound sign (looking closely, I see that I escaped it, but that was unnecessary, I was in a hurry). \#? |
|
04-16-2010, 05:16 PM | #19 |
Right, Except When Wrong
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
|
Just to make this more complicated than it already is... Some of my books aren't part of a series. Thus, for those books, there isn't a bracket with a series and number at the end of the name. When Calibre adds those books, the whole file name is added as the book name, the author is set as Unknown and nothing goes into the series name. What do I need to add (and where) to the RE to say that if the file extension is encountered before a an open bracket, then process as if there isn't a series?
|
04-16-2010, 05:45 PM | #20 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
Advert | |
|
04-19-2010, 10:06 AM | #21 |
Right, Except When Wrong
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
|
I've endeavored to follow a consistent format for naming books. So a book without a series would look like this:
AuthorLast, AuthorFirst - Title.extension while a book with a series would like like this: AuthorLast, AuthorFirst - Title [Series #Num].extension One other quirk that I've noticed is that the RE seems to return unpredictable results if there is a dash or hyphen in the author's name or the book title. |
04-19-2010, 10:40 AM | #22 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
(?P<title>([^\-_\[\(]+)) Code:
(?P<title>([^_\[\(]+)) |
|
04-19-2010, 10:59 AM | #23 |
Right, Except When Wrong
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
|
Interesting. OK. So what do I add (and where do I add it) to note that the filename may or may not include a series? There won't be any brackets unless they are used to designate a series.
Thanks again, for all the help. |
04-19-2010, 11:25 AM | #24 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
There is no "magic" regex that fits everyone/everything. BTW, brackets are special, so you need to "escape" them with a backslash. This is a bracket: "\[" This means match the letter "a" or "b": "[ab] This means match an open bracket or the letter "a": "[\[a] Last edited by Starson17; 04-19-2010 at 11:30 AM. |
||
04-19-2010, 12:15 PM | #25 |
Right, Except When Wrong
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
|
Thanks again. I'll play around with that. Note that I don't have brackets around the filetype. I only use brackets for series information; I use parenthesis for other title-related information. I'm pretty careful about using space - space between author and title, but lots of my titles (and some authors) have hyphens in them. I've thought about replacing the space - space with space ~ space. I wonder if I might have more success with that...
|
04-19-2010, 01:19 PM | #26 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I see it was Spiffy who occasionally had brackets around the filetype when he wrote: Quote:
|
||
04-21-2010, 02:03 PM | #27 |
Right, Except When Wrong
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
|
I thought that I was making good progress, but now I'm stumped. Some of my files don't have a series. For example:
King, Stephen - Under the Dome.epub Here is the RE: ^(?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b)(\s*-\s*)?(?P<title>([^\-_\[\(]+)) ((\[(?P<series>[^0-9\-]+) (- )?\#?(?P<series_index>[0-9.]+)\]))? When I test that RE on that filename in Calibre, I get the following results: Title: Under the Author: _Stephen King [I've used an underscore to indicate a space that Calibre is putting bofore the author's name] I can't quite figure out where the RE is breaking. Thanks again for the help. |
04-21-2010, 02:48 PM | #28 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Try this: Code:
^(?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b)(\s*-\s*)?(?P<title>([^\-_\[\(]+))((\[(?P<series>[^0-9\-]+) (- )?\#?(?P<series_index>[0-9.]+)\]))? |
|
04-21-2010, 05:57 PM | #29 |
Right, Except When Wrong
Posts: 353
Karma: 3968525
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
|
Thanks again. That RE works.
|
09-12-2011, 10:59 AM | #30 | |
Junior Member
Posts: 1
Karma: 10
Join Date: Sep 2011
Device: Sony PRS300
|
Quote:
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(\[?(?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\]?\s*-\s*)?(?P<title>[a-zA-Z1-9 ]+) My books tend to be names as follows: AuthorLN, AuthorFN - Series # - Title - ISBN (lit).lit the (lit) is not always in the filename and not all books have the ISBN in the name. When I try to add coding to pickup the ISBN it always messes up imports when the ISBN doesn't exist. Until I saw this thread I have been using this expression and would manually remove the ISBN portion when my filename did not include the ISBN: Code:
(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?-(?P<isbn>.*) Title = Series # Authors = authors Series = No Match Series Index = No Match ISBN = Title Any help would be appreciated. Last edited by joelgilb; 09-12-2011 at 11:10 AM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regular Expression Help | Azhad | Calibre | 86 | 09-27-2011 02:37 PM |
Custom Regular Expressions for adding book information | bigbot3 | Calibre | 1 | 12-25-2010 06:28 PM |
Regular Expression Help | smartmart | Calibre | 5 | 10-17-2010 05:19 AM |
Regular Expression For Adding Books | jhart711 | Calibre | 3 | 09-27-2010 06:51 AM |
Help with the regular expression | Dysonco | Calibre | 9 | 03-22-2010 10:45 PM |