![]() |
#1 |
Groupie
![]() ![]() ![]() ![]() ![]() Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
|
![]()
Probably the five millionth since this forum was created, I suppose. :-)
Many of my files are kept in the following naming format: L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz.lit Please note those square brackets are virtually always used in what I've got stored, and the <space><single dash><space> between an author and series/series number and between that and the book title is also pretty consistent. Assuming the vast majority of my books follow this format, does anyone have a good expression to add them with? Ideally the expression would recognize the square brackets as a tip off that a book series and book number are being disclosed. Is such a thing even possible? I ask, because if a book ISN'T part of a series, the existing file name is probably something more like this: H. G. Wells - The Time Machine.epub OPTIONALLY, its at least possible (although I bet this is even harder to resolve) that some files may look like this: Jules Verne - Journey to the Center of the Earth (html).zip Pie in the sky, if those ROUND brackets could be a tip off to ignore something as NOT being part of a book title, that would be ideal. Yeah, even ignorant of how to build these expressions properly, I'm skeptical. Does anyone out there have stuff following approximately these "rules", and what have you done to best ensure proper Calibre "importing"? ANY subset of the requirements I list above, dealing with series names in square brackets, ignoring stuff in round brackets, etc. would be better than nothing, but I don't expect much. Please note that I have zero ability at scripting, so I'm really just asking what the best canned solution is. If its "you're out of luck" I guess I'll figure something else out. If there are proper expressions to handle this already, great. If there are other third party tools outside of Calibre to accurately mass rename files FIRST in an acceptably way, I suppose that's something I'd be willing to try as well (although using a 2nd tool first seems redundant if Calibre can be made to do it). Thanks infinitely in advance for any possible suggestions! Last edited by Spiffy; 04-05-2010 at 03:45 PM. |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,782
Karma: 30548723
Join Date: Dec 2006
Location: Singapore
Device: Boyue
|
I had the same problem so I started using booksorter
to rename my files to Author - Series # - Title.lit http://iterati.org/ebookTools/BookSorter/Default.aspx then used this for the add to (?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ? I did see somewhere on the forum the regex you are looking for but couldn't find it |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Groupie
![]() ![]() ![]() ![]() ![]() Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
|
Quote:
Another way to do this DID occur to me this morning. Doing it in discrete steps. First, importing the books without a series on their own, with a fairly standard regex. Then going back and CHANGING the regex to expect a series and importing THOSE books. But I guess I still would have to deal with the square brackets. I either have to have a way to mass remove them, or mass ignore them in an import. The second problem, the format occasionally being at the end in round brackets, I suppose I just have to live with (and manually erase after the fact). |
|
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I started to respond to this twice, but I'm not at home and can't test anything I post. It's pretty easy to make the brackets optional if everything else is right.
this is an optional open bracket: \[? and this is an optional closed bracket: \]? Try this (totally untested): Code:
wrong code posted (untested) Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(\[?(?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\]?\s*-\s*)?(?P<title>.+) Last edited by Starson17; 04-06-2010 at 07:24 PM. |
![]() |
![]() |
![]() |
#5 |
Groupie
![]() ![]() ![]() ![]() ![]() Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
|
Hmm. Good to know that.
The expression doesn't seem to work, unfortunately. But I appreciate the try. When you use that string and run the test tool inside Calibre against this book: L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz.lit The following shows in the test results: Title: L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz Authors - nothing Series - nothing Series index - nothing |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
![]() |
#7 |
Groupie
![]() ![]() ![]() ![]() ![]() Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
|
Genius work. Thank you--that's quite nifty. It even recognizes that if there's no brackets (or is it counting dashes?), there's no series, and realizes that the position of the title will be different.
I hate to push, but do you know a way to address the other main issue I had? The occasional optional file type sandwiched between ROUND brackets? Like so: Jules Verne - Journey to the Center of the Earth (html).zip Ideally, the best result would be to drop those file types, round brackets and everything between them, from the Title. Inevitably legit titles with round brackets could be affected, I guess, but that's a small price to pay. I won't be greedy though. You've already saved me a ton of potential headache. |
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
(staggering a bit) ..... try this:
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?(\[?(?P<series>[^0-9\-]+) (- )?(?P<series_index>[0-9.]+)\]?\s*-\s*)?(?P<title>[a-zA-Z1-9 ]+)(\(.*\))?$ |
![]() |
![]() |
![]() |
#9 | |
Groupie
![]() ![]() ![]() ![]() ![]() Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
|
Quote:
![]() No dice though. It tosses everything into title field again. |
|
![]() |
![]() |
![]() |
#10 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
It works for me. Try again, or show me what you're testing it on. It correctly parsed all of these:
L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz.lit L. Frank Baum - [Wizard of Oz 02] - The Marvelous Land of Oz(lit).lit L. Frank Baum - Wizard of Oz 02 - The Marvelous Land of Oz(lit).lit L. Frank Baum - Wizard of Oz 02 - The Marvelous Land of Oz.lit Last edited by Starson17; 04-06-2010 at 10:17 PM. |
![]() |
![]() |
![]() |
#11 | |
Groupie
![]() ![]() ![]() ![]() ![]() Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
|
Quote:
|
|
![]() |
![]() |
![]() |
#12 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Apr 2010
Device: none
|
Actually, there are a few instances that it doesn't work to kill version numbers and formats after the title (although it is beautifully written).
Replacing (\(.*\))?$ with .+ seems to drop everything after the title. |
![]() |
![]() |
![]() |
#13 | |||
Groupie
![]() ![]() ![]() ![]() ![]() Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
|
Quote:
The regex works perfectly with any of this: Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#14 | ||
Groupie
![]() ![]() ![]() ![]() ![]() Posts: 160
Karma: 416
Join Date: Apr 2010
Device: Astak EZ Reader Pro AND Sony PRS-505
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#15 |
Right, Except When Wrong
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 360
Karma: 4323767
Join Date: Aug 2007
Location: Indianapolis
Device: Kindle Oasis 3 (sometimes iPad Mini).
|
This is so close to what I'm trying to accomplish that I thought it was worth posting my query, too. In my case, book titles are formatted like this:
Brown, Dan - The Lost Symbol [Robert Langdon #3].epub AuthorLast, AuthorFirst - Title [Series #SeriesNum].format It looks like the code that was provided is very close, but I'm not quite sure where the "delimeters" (not sure of the right term) are between the Author, Series, and Title sections of the RE. Thanks for any help you can provide. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regular Expression Help | Azhad | Calibre | 86 | 09-27-2011 03:37 PM |
Custom Regular Expressions for adding book information | bigbot3 | Calibre | 1 | 12-25-2010 07:28 PM |
Regular Expression Help | smartmart | Calibre | 5 | 10-17-2010 06:19 AM |
Regular Expression For Adding Books | jhart711 | Calibre | 3 | 09-27-2010 07:51 AM |
Help with the regular expression | Dysonco | Calibre | 9 | 03-22-2010 11:45 PM |