| 
			
			 | 
		#16 | |
| 
			
			
			
			 Enthusiast 
			
			![]() Posts: 30 
				Karma: 10 
				Join Date: Dec 2010 
				
				
				
				Device: PRS-650 ... ipad 
				
				
				 | 
	
	
	
		
		
			
			 Quote: 
	
 if you are living in ne countries, cleaning up text in your own language... don't forget to put additional chars - words may ending with it , in the formula!!! i.e. german (ß) ([a-zß])</p>\s+<p class="calibre2"> but if you understood what happens here, you know already. don't you?  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#17 | ||
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337 
				Karma: 123457 
				Join Date: Apr 2009 
				Location: Malaysia 
				
				
				Device: PRS-650, iPhone 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Quote: 
	
 Code: 
	([a-zäëïöüàèìòùáćéíóńśúâêîôûçąężı,:)\IA\u00DF]|(?<!\&\w{4});)
Code: 
	(?<=.{85}([a-zäëïöüàèìòùáćéíóńśúâêîôûçąężı,:)\IA\u00DF]|(?<!\&\w{4});))\s*</(span|p|div)>\s*(</(p|span|div)>)?\s*(?P<up2threeblanks><(p|span|div)[^>]*>\s*(<(p|span|div)[^>]*>\s*</(span|p|div)>\s*)</(span|p|div)>\s*){0,3}\s*<(span|div|p)[^>]*>\s*(<(span|div|p)[^>]*>)?\s*
Last edited by ldolse; 12-22-2010 at 09:17 PM.  | 
||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#18 | 
| 
			
			
			
			 frumious Bandersnatch 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,570 
				Karma: 20150435 
				Join Date: Jan 2008 
				Location: Spaniard in Sweden 
				
				
				Device: Cybook Orizon, Kobo Aura 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#19 | 
| 
			
			
			
			 Created Sigil, FlightCrew 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,982 
				Karma: 350515 
				Join Date: Feb 2008 
				
				
				
				Device: Kobo Clara HD 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Instead of building a character class yourself, how about using "\w"? That will match any unicode letter, number and underscore.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#20 | 
| 
			
			
			
			 Calibre Plugins Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,735 
				Karma: 2208556 
				Join Date: Oct 2010 
				Location: Australia 
				
				
				Device: Kindle Oasis 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I'm not a regex guru by any means but a number of the expressions we have been looking at are intentionally excluding the uppercase versions of characters which \w would include.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#21 | 
| 
			
			
			
			 Created Sigil, FlightCrew 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,982 
				Karma: 350515 
				Join Date: Feb 2008 
				
				
				
				Device: Kobo Clara HD 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#22 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337 
				Karma: 123457 
				Join Date: Apr 2009 
				Location: Malaysia 
				
				
				Device: PRS-650, iPhone 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#23 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337 
				Karma: 123457 
				Join Date: Apr 2009 
				Location: Malaysia 
				
				
				Device: PRS-650, iPhone 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Requirement might be a bit strong, just balancing between false positives and false negatives...  I generally try to err on false negatives, since they're easier to detect later, but sometimes I think it might be easier to use \w since line length is in there as an extra check, but haven't made that jump.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#24 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720 
				Karma: 1759970 
				Join Date: Sep 2010 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			this thread is very useful. I'd been fixing up line breaks manually in Word, which took ages. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I am amazed how easy it can be via regex. I use variations of the code given earlier in this thread i.e. What you want to do is something like this: Find: ([a-z])</p>\s+<p class="calibre2"> Replace: \1 changing calibre2 as needed on a per book basis - sometimes it needs a different 1 or 2 digit number like calibre13 one that still slips though the test though, is when a sentence has split such that the new line starts with the one letter word capital I. can that also be caught via regex ?  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#25 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720 
				Karma: 1759970 
				Join Date: Sep 2010 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			 I look at regex syntax reference page here 
		
	
		
		
		
		
		
		
		
		
		
		
	
	http://www.regular-expressions.info/reference.html but there's no definition of the \1 operation ? do I need a better reference page or book ??? what's a good book to learn from ( intending to use more regex in both sigil and calibre ). I'd prefer an ebook that I can put onto my kindle.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#26 | |
| 
			
			
			
			 Calibre Plugins Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,735 
				Karma: 2208556 
				Join Date: Oct 2010 
				Location: Australia 
				
				
				Device: Kindle Oasis 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 http://www.regular-expressions.info/refadv.html I can't name any books, perhaps someone else can. That website is very good, and between the basic and advanced page you have the "cheat sheet" for most of what you need to know to refresh your memory.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#27 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720 
				Karma: 1759970 
				Join Date: Sep 2010 
				
				
				
				Device: none 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
| Tags | 
| find, html code, regex, replace, source view | 
| Thread Tools | Search this Thread | 
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Request: Adding linebreaks in sidebar window. | svenlind | Calibre | 5 | 04-14-2010 04:46 AM | 
| Chapters showing unwanted pagebreaks and < h1 > text | raltman | Calibre | 2 | 10-05-2009 05:50 PM | 
| PDF reformatting help. | Ham88 | Workshop | 1 | 05-14-2009 04:07 PM | 
| Using Acrobat for reformatting to e-readers | snowgoose | 8 | 02-04-2009 09:13 PM | |
| Reformatting untidy text files macro | 46137 | Workshop | 8 | 05-02-2008 10:27 PM |