MobileRead Forums - View Single Post

Sabardeyn · 05-25-2009, 01:12 AM

Kad,

The regular expression that was suggested is operating correctly. Unfortunately, the results it is generating and what you wanted are not the same thing. The inconsistent data set (author - title vs author - series - title) creates a major problem in trying to get calibre to correctly import your books accurately.

Off the top of my head the easiest way to correct for this is simply to import your books in two different sets. If you split your books into Series or No Series groups, changed the expression accordingly, you would be fine. While this sounds wonderful as a theory, I'm sure that several of the authors have books that follow both file naming formats. So this would be a major hassle to separate them. Or to add groups of books at a time matching whichever regular expression (regex) you're using. Not to mention we might be talking about doing all this on hundreds of books.

What you need is a regex that determines whether the filename currently being tested contains two or three fields. The first field is always the author(s)/editor(s) name, so grabbing that straight off should be fine. But the next field is either the series or the title. If their is a way to determine if " - " occurs again in the filename, then you can assume that what lies between the first and second " - " literal string is the series (more likely, series & series index). Otherwise, anything remaining automatically becomes the title.

While this seems straight forward enough, potential issues remain. Hyphenated author names or titles, if they specifically contain " - ", will cause the import to be performed incorrectly for that individual book. Of course, with 99% of them entered automatically, you might find the remainder acceptable for manual entry / editing.

While I know what needs to be done, I'm in the same position you are, simply too much of a regex & calibre neophyte to generate something this complex.

Another website I came across has a Regular Expression Tutorial; I've listed it here just in case you might find it helpful. Keep in mind calibre uses the python "flavor" of regex. (The calibre referenced site should always take precedence.)

05-25-2009, 01:12 AM	#4
Sabardeyn Guru Posts: 644 Karma: 1242364 Join Date: May 2009 Location: The Right Coast Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)	Kad, The regular expression that was suggested is operating correctly. Unfortunately, the results it is generating and what you wanted are not the same thing. The inconsistent data set (author - title vs author - series - title) creates a major problem in trying to get calibre to correctly import your books accurately. Off the top of my head the easiest way to correct for this is simply to import your books in two different sets. If you split your books into Series or No Series groups, changed the expression accordingly, you would be fine. While this sounds wonderful as a theory, I'm sure that several of the authors have books that follow both file naming formats. So this would be a major hassle to separate them. Or to add groups of books at a time matching whichever regular expression (regex) you're using. Not to mention we might be talking about doing all this on hundreds of books. What you need is a regex that determines whether the filename currently being tested contains two or three fields. The first field is always the author(s)/editor(s) name, so grabbing that straight off should be fine. But the next field is either the series or the title. If their is a way to determine if " - " occurs again in the filename, then you can assume that what lies between the first and second " - " literal string is the series (more likely, series & series index). Otherwise, anything remaining automatically becomes the title. While this seems straight forward enough, potential issues remain. Hyphenated author names or titles, if they specifically contain " - ", will cause the import to be performed incorrectly for that individual book. Of course, with 99% of them entered automatically, you might find the remainder acceptable for manual entry / editing. While I know what needs to be done, I'm in the same position you are, simply too much of a regex & calibre neophyte to generate something this complex. Another website I came across has a Regular Expression Tutorial; I've listed it here just in case you might find it helpful. Keep in mind calibre uses the python "flavor" of regex. (The calibre referenced site should always take precedence.)