| 
			
			 | 
		#1 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 23 
				Karma: 10 
				Join Date: Dec 2007 
				Location: Rome, Italy 
				
				
				Device: PRS-500, PRS-505, Milestone, Galaxy Tab 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Regular Expression Help
			 
			
			
			Hi there  
		
	
		
		
		
		
		
		
		
		
		
		
	
	![]() Here's my problem: I got a bunch of pdf files named like those examples: Name Surname - Name of the Series 01 - Title of the Boook.pdf or Name Surname - Title of the Boook.pdf For the first one I use this: (?P<author>[^_]+) - (?P<series>[^_]+) (?P<series_index>[0-9]+) - (?P<title>.+) And for the second example I use: (?P<author>[^_]+) - (?P<title>.+) The problem is that the parsing cut the last word, so the title result in "Title of the" Anyway, is possible to join those 2 expression so the parsing understand when there's a series space in the filename or not ( xxx - xxx instead of xxx - xxx 3 - xxx) ? The other problem I got is that calibre look inside the pdf for the title and author field, and sometime this result in some garbled text, is there a way to override this and use only the data parsed from the filename? Thanks in advance for any advices. P.S. sorry for my subpar english  
		 | 
| 
		 | 
	
	
| 
			
			 | 
		#2 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			just put question marks after each + sign
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		
 | 
	
	
| 
			
			 | 
		#3 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 23 
				Karma: 10 
				Join Date: Dec 2007 
				Location: Rome, Italy 
				
				
				Device: PRS-500, PRS-505, Milestone, Galaxy Tab 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thanks, that at least fix the "Title of the" problem ;D 
		
	
		
		
		
		
		
		
		
		
		
		
	
	so now the expression are: (?P<author>[^_]+) - (?P<series>[^_]+) (?P<series_index>[0-9]+) - (?P<title>.+) ? and (?P<author>[^_]+) - (?P<title>.+) ? no way to make only one smart enough to skip the series and series index if the filename is xxx -xxx.pdf ?I was looking in something like (?<!...) but I can't figure it out.. Thanks anyway  
		 | 
| 
		 | 
	
	
| 
			
			 | 
		#4 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			try enclosing the series part inside another group and make that group optional with {0,1}
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		
 | 
	
	
| 
			
			 | 
		#5 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 23 
				Karma: 10 
				Join Date: Dec 2007 
				Location: Rome, Italy 
				
				
				Device: PRS-500, PRS-505, Milestone, Galaxy Tab 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Ok, I'm getting crazy... 
		
	
		
		
		
		
		
		
		
		
		
		
	
	this is expression I got now: (?P<author>[^_]+) - *(?P<series>[^_]*) (?P<series_index>[0-9]*) -? (?P<title>[^_].+) ? it recognize: Name Surname - Name of the Series 01 - Title of the Book.pdf and Name Surname - Title of the Book.pdf (notice the 3 spaces after the - ) I can't, for the love of God, erase those leading spaces from the expression... Can anybody help? I don't ssssspeck sssspython well...  | 
| 
		 | 
	
	
| 
			
			 | 
		#6 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			replace the spaces in your expression with \s*
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		
 | 
	
	
| 
			
			 | 
		#7 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 23 
				Karma: 10 
				Join Date: Dec 2007 
				Location: Rome, Italy 
				
				
				Device: PRS-500, PRS-505, Milestone, Galaxy Tab 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			At last  
		
	
		
		
		
		
		
		
		
		
		
		
	
	![]() if anybody need it, here it is ![]() (?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?  | 
| 
		 | 
	
	
| 
			
			 | 
		#8 | 
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,686 
				Karma: 12595249 
				Join Date: Jun 2009 
				Location: Madrid, Spain 
				
				
				Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Hi, 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I'm new here and I'm trying to order my library. I have a problem with the regexp   . I'm not able to load the <series_index>, it's loaded into the title.For example, if I have "[Women of the Otherworld-8]- Personal demon", it will put: 
 I'm not able to change it.  
		 | 
| 
		 | 
	
	
| 
			
			 | 
		#9 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 644 
				Karma: 1242364 
				Join Date: May 2009 
				Location: The Right Coast 
				
				
				Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda) 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Terise, 
		
	
		
		
		
		
		
		
		
		
		
		
	
	The problem is your filenames are not in the exact format that the regex is expecting. It wants to find " - " (space dash space) between the different fields. Your filename example does not make use of that exact field delimiter. It would work correctly if you had "Women of the Otherworld 8 - Personal Demon". I believe the brackets will also be a problem as they might be considered an end of word / whitespace (the "\s" portion of the regex).  | 
| 
		 | 
	
	
| 
			
			 | 
		#10 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 3 
				Karma: 10 
				Join Date: Aug 2009 
				
				
				
				Device: ipod touch 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Hi, I've had a search through a few of these threads to see if anyone has asked this question before, but I couldn't find it, so apologies if I missed it somewhere. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	A lot of my filenames are in the following format: Surname, Firstname - Title or Surname, Firstname - Series # - Title My problem is that when I start the expression with the default (?P<author>[^_]+) it puts the author details in back to front and messes up the author sort as well. How do I go about reversing the surname and the first name in the expression so that the Author field is populated correctly? I've looked at the guide for regular expressions, but it's a bit above my head at the moment, although I'm persevering to try and wrap my head around it.  | 
| 
		 | 
	
	
| 
			
			 | 
		#11 | |
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 2 
				Karma: 10 
				Join Date: Aug 2009 
				
				
				
				Device: Windows Mobile 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 ![]() Here's my latest RegExp: Code: 
	^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+) 
 
 I tried using something like this to define multiple orderings. But, I can't reuse a group name. But then, with all the different formats the above RegExp can handle now, it would probably match anything with a reversed order anyway. Code: 
	((?<author>...) - (?<title>...))|((?<title>...) - (?<author>...))  | 
|
| 
		 | 
	
	
| 
			
			 | 
		#12 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 2 
				Karma: 10 
				Join Date: Aug 2009 
				
				
				
				Device: iPhone 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I've been looking through the forums trying to find an answer to this one: 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I'd like to have my ePub files ONLY be the title. I changed the expression on the advanced tab to (?P<title>.+) pub it's still adding hyphens and the author name. I know I'm missing something, but what? Lori  | 
| 
		 | 
	
	
| 
			
			 | 
		#13 | |
| 
			
			
			
			 Reader 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85 
				Karma: 6124 
				Join Date: Jul 2009 
				
				
				
				Device: PRS-505 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Your expression: (?P<title>.+) is not quite specific enough. The dot "." acts as a wildcard search character (it can match anything) and the plus "+" acts as a multiplier. So your expression says "Match any character any number of times, and put that into the 'title' container. It's just running a little rampant. Try something like this: Code: 
	(?P<title>.+?) - (?P<title> This part says that anything in the parenthesis is going to be put into a container called "<title>" that you can use later. Calibre uses this internally to populate the various fields in it's database. .+? This part says "Match any character, repeat that, but do it lazily". The question mark at the end makes a multiplier go lazy, meaning that it will only match as much as it has to. Without the ?, the multiplier goes crazy, and you usually end up matching everything, forever. ) - This closes the group, and then matches the following space and the dash after that. We need that dash as a way of saying "This isn't part of what I'm looking for" which is why we place it outside of the parenthesis. This expression work on my completely boring "Book Title - nothing important.txt" filename, but you'll need to see if it fits your needs. This expression will *only* work on file names where the Book Title is the first thing in the file name. I don't have enough experience with knowing how file names are constructed for books yet. Last edited by sircastor; 08-27-2009 at 03:25 PM. Reason: fixed for copying  | 
|
| 
		 | 
	
	
| 
			
			 | 
		#14 | |
| 
			
			
			
			 Reader 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85 
				Karma: 6124 
				Join Date: Jul 2009 
				
				
				
				Device: PRS-505 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Unless I'm missing something, I would skip trying to get your expression to handle different orders.  | 
|
| 
		 | 
	
	
| 
			
			 | 
		#15 | 
| 
			
			
			
			 Liseuse Lover 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 869 
				Karma: 1035404 
				Join Date: Jul 2008 
				Location: Netherlands 
				
				
				Device: PRS-505 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Perhaps we should make a sticky of a regex thread (or make a "ask your regex question" thread) - I know there are always a lot of questions about it; it is such a superbly powerful filtering mechanism yet very daunting and confusing for beginners.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
![]()  | 
            
        
            
| Tags | 
| regex, regular expressions | 
| Thread Tools | Search this Thread | 
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Regular Expression Help | smartmart | Calibre | 5 | 10-17-2010 06:19 AM | 
| Need Help Creating a Regular Expression | Worm | Calibre | 9 | 08-18-2010 02:20 PM | 
| Regular Expression Help Needed | dloyer4 | Calibre | 1 | 07-25-2010 11:37 PM | 
| Help with the regular expression | Dysonco | Calibre | 9 | 03-22-2010 11:45 PM | 
| I don't know how to use wilcards and regular expression.... | superanima | Sigil | 4 | 02-21-2010 10:42 AM |