|  09-26-2008, 11:47 AM | #1 | 
| Member  Posts: 23 Karma: 10 Join Date: Dec 2007 Location: Rome, Italy Device: PRS-500, PRS-505, Milestone, Galaxy Tab | 
				
				Regular Expression Help
			 
			
			Hi there   Here's my problem: I got a bunch of pdf files named like those examples: Name Surname - Name of the Series 01 - Title of the Boook.pdf or Name Surname - Title of the Boook.pdf For the first one I use this: (?P<author>[^_]+) - (?P<series>[^_]+) (?P<series_index>[0-9]+) - (?P<title>.+) And for the second example I use: (?P<author>[^_]+) - (?P<title>.+) The problem is that the parsing cut the last word, so the title result in "Title of the" Anyway, is possible to join those 2 expression so the parsing understand when there's a series space in the filename or not ( xxx - xxx instead of xxx - xxx 3 - xxx) ? The other problem I got is that calibre look inside the pdf for the title and author field, and sometime this result in some garbled text, is there a way to override this and use only the data parsed from the filename? Thanks in advance for any advices. P.S. sorry for my subpar english   | 
|   | 
|  09-26-2008, 12:09 PM | #2 | 
| creator of calibre            Posts: 45,598 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			just put question marks after each + sign
		 | 
|   | 
|  09-26-2008, 12:22 PM | #3 | 
| Member  Posts: 23 Karma: 10 Join Date: Dec 2007 Location: Rome, Italy Device: PRS-500, PRS-505, Milestone, Galaxy Tab | 
			
			Thanks, that at least fix the "Title of the" problem ;D so now the expression are: (?P<author>[^_]+) - (?P<series>[^_]+) (?P<series_index>[0-9]+) - (?P<title>.+) ? and (?P<author>[^_]+) - (?P<title>.+) ? no way to make only one smart enough to skip the series and series index if the filename is xxx -xxx.pdf ?I was looking in something like (?<!...) but I can't figure it out.. Thanks anyway   | 
|   | 
|  09-26-2008, 12:36 PM | #4 | 
| creator of calibre            Posts: 45,598 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			try enclosing the series part inside another group and make that group optional with {0,1}
		 | 
|   | 
|  10-01-2008, 06:25 AM | #5 | 
| Member  Posts: 23 Karma: 10 Join Date: Dec 2007 Location: Rome, Italy Device: PRS-500, PRS-505, Milestone, Galaxy Tab | 
			
			Ok, I'm getting crazy... this is expression I got now: (?P<author>[^_]+) - *(?P<series>[^_]*) (?P<series_index>[0-9]*) -? (?P<title>[^_].+) ? it recognize: Name Surname - Name of the Series 01 - Title of the Book.pdf and Name Surname - Title of the Book.pdf (notice the 3 spaces after the - ) I can't, for the love of God, erase those leading spaces from the expression... Can anybody help? I don't ssssspeck sssspython well... | 
|   | 
|  10-01-2008, 12:51 PM | #6 | 
| creator of calibre            Posts: 45,598 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			replace the spaces in your expression with \s*
		 | 
|   | 
|  10-02-2008, 05:04 AM | #7 | 
| Member  Posts: 23 Karma: 10 Join Date: Dec 2007 Location: Rome, Italy Device: PRS-500, PRS-505, Milestone, Galaxy Tab | 
			
			At last   if anybody need it, here it is  (?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ? | 
|   | 
|  06-15-2009, 10:54 AM | #8 | 
| Grand Sorcerer            Posts: 6,685 Karma: 12595249 Join Date: Jun 2009 Location: Madrid, Spain Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2 | 
			
			Hi, I'm new here and I'm trying to order my library. I have a problem with the regexp  . I'm not able to load the <series_index>, it's loaded into the title. For example, if I have "[Women of the Otherworld-8]- Personal demon", it will put: 
 I'm not able to change it.   | 
|   | 
|  06-17-2009, 05:12 PM | #9 | 
| Guru            Posts: 644 Karma: 1242364 Join Date: May 2009 Location: The Right Coast Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda) | 
			
			Terise, The problem is your filenames are not in the exact format that the regex is expecting. It wants to find " - " (space dash space) between the different fields. Your filename example does not make use of that exact field delimiter. It would work correctly if you had "Women of the Otherworld 8 - Personal Demon". I believe the brackets will also be a problem as they might be considered an end of word / whitespace (the "\s" portion of the regex). | 
|   | 
|  08-26-2009, 02:37 AM | #10 | 
| Junior Member  Posts: 3 Karma: 10 Join Date: Aug 2009 Device: ipod touch | 
			
			Hi, I've had a search through a few of these threads to see if anyone has asked this question before, but I couldn't find it, so apologies if I missed it somewhere. A lot of my filenames are in the following format: Surname, Firstname - Title or Surname, Firstname - Series # - Title My problem is that when I start the expression with the default (?P<author>[^_]+) it puts the author details in back to front and messes up the author sort as well. How do I go about reversing the surname and the first name in the expression so that the Author field is populated correctly? I've looked at the guide for regular expressions, but it's a bit above my head at the moment, although I'm persevering to try and wrap my head around it. | 
|   | 
|  08-26-2009, 02:07 PM | #11 | |
| Junior Member  Posts: 2 Karma: 10 Join Date: Aug 2009 Device: Windows Mobile | Quote: 
  Here's my latest RegExp: Code: ^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+) 
 
 I tried using something like this to define multiple orderings. But, I can't reuse a group name. But then, with all the different formats the above RegExp can handle now, it would probably match anything with a reversed order anyway. Code: ((?<author>...) - (?<title>...))|((?<title>...) - (?<author>...)) | |
|   | 
|  08-27-2009, 12:26 PM | #12 | 
| Junior Member  Posts: 2 Karma: 10 Join Date: Aug 2009 Device: iPhone | 
			
			I've been looking through the forums trying to find an answer to this one: I'd like to have my ePub files ONLY be the title. I changed the expression on the advanced tab to (?P<title>.+) pub it's still adding hyphens and the author name. I know I'm missing something, but what? Lori | 
|   | 
|  08-27-2009, 02:18 PM | #13 | |
| Reader            Posts: 85 Karma: 6124 Join Date: Jul 2009 Device: PRS-505 | Quote: 
 Your expression: (?P<title>.+) is not quite specific enough. The dot "." acts as a wildcard search character (it can match anything) and the plus "+" acts as a multiplier. So your expression says "Match any character any number of times, and put that into the 'title' container. It's just running a little rampant. Try something like this: Code: (?P<title>.+?) - (?P<title> This part says that anything in the parenthesis is going to be put into a container called "<title>" that you can use later. Calibre uses this internally to populate the various fields in it's database. .+? This part says "Match any character, repeat that, but do it lazily". The question mark at the end makes a multiplier go lazy, meaning that it will only match as much as it has to. Without the ?, the multiplier goes crazy, and you usually end up matching everything, forever. ) - This closes the group, and then matches the following space and the dash after that. We need that dash as a way of saying "This isn't part of what I'm looking for" which is why we place it outside of the parenthesis. This expression work on my completely boring "Book Title - nothing important.txt" filename, but you'll need to see if it fits your needs. This expression will *only* work on file names where the Book Title is the first thing in the file name. I don't have enough experience with knowing how file names are constructed for books yet. Last edited by sircastor; 08-27-2009 at 02:25 PM. Reason: fixed for copying | |
|   | 
|  08-28-2009, 02:23 AM | #14 | |
| Reader            Posts: 85 Karma: 6124 Join Date: Jul 2009 Device: PRS-505 | Quote: 
 Unless I'm missing something, I would skip trying to get your expression to handle different orders. | |
|   | 
|  08-28-2009, 02:44 AM | #15 | 
| Liseuse Lover            Posts: 869 Karma: 1035404 Join Date: Jul 2008 Location: Netherlands Device: PRS-505 | 
			
			Perhaps we should make a sticky of a regex thread (or make a "ask your regex question" thread) - I know there are always a lot of questions about it; it is such a superbly powerful filtering mechanism yet very daunting and confusing for beginners.
		 | 
|   | 
|  | 
| Tags | 
| regex, regular expressions | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Regular Expression Help | smartmart | Calibre | 5 | 10-17-2010 05:19 AM | 
| Need Help Creating a Regular Expression | Worm | Calibre | 9 | 08-18-2010 01:20 PM | 
| Regular Expression Help Needed | dloyer4 | Calibre | 1 | 07-25-2010 10:37 PM | 
| Help with the regular expression | Dysonco | Calibre | 9 | 03-22-2010 10:45 PM | 
| I don't know how to use wilcards and regular expression.... | superanima | Sigil | 4 | 02-21-2010 09:42 AM |