|  02-23-2012, 02:57 PM | #1 | 
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | 
				
				another regex puzzle - detect capitalised phrases
			 
			
			I suspect there is no easy answer to this but I will ask anyway. given a book which uses capitalisation in lieu of scene breaks, with all paragraphs sharing the same CSS i.e. THIS IS HOW THE 1st paragraphs starts.......blah blah but not the next paragraph... Or the one after that...... .... YET SOME TIME LATER THERE is another instance ... I want to pick out those capitalised starts in order to assign a unique CSS class. but devising a rule is very hard. testing that 2nd letter of a paragraph is capitalised works most times but will miss I CANNOT GET THIS one... and will miss A TOUGH ACT TO follow and will mis-classify "I don't want this one" any better methods, anyone ? | 
|   |   | 
|  02-23-2012, 03:02 PM | #2 | 
| ♫            Posts: 661 Karma: 506380 Join Date: Aug 2010 Location: Germany Device: Kobo Aura / PB Lux 2 / Bookeen Frontlight / Kobo Mini / Nook Color | 
			
			<p>[^a-z]{4,} Will find any paragraph with the first 4 being no lower ones. Of course this still will find <p>U.S.A. is the country... I guess you need to check if the 4 is enough. | 
|   |   | 
| Advert | |
|  | 
|  02-23-2012, 03:04 PM | #3 | 
| Berti            Posts: 1,197 Karma: 4985964 Join Date: Jan 2012 Location: Zischebattem Device: Acer Lumiread | Code: <p>([A-Z])([A-Z|\s[A-Z]) Actually Code: <p>([A-Z])([\sA-Z]) Last edited by mmat1; 02-23-2012 at 03:13 PM. | 
|   |   | 
|  02-23-2012, 03:04 PM | #4 | |
| Well trained by Cats            Posts: 31,249 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A | Quote: 
 Code: ([A-Z]* ){2,}find 1 or more Upper followed by a space, 2 or more times | |
|   |   | 
|  02-24-2012, 02:15 AM | #5 | 
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | 
			
			thanks for all the different solutions - you guys make it look so easy !
		 | 
|   |   | 
| Advert | |
|  | 
|  02-24-2012, 04:18 AM | #6 | 
| Guru            Posts: 657 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin) | 
			
			Shouldn't that be a + instead of the *, as * is 0 or more times, which would match paragraphs that have several spaces at beginning, which are quite often (badly) used for indents.
		 | 
|   |   | 
|  02-24-2012, 09:04 AM | #7 | 
| Well trained by Cats            Posts: 31,249 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A | |
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| regex puzzle: finding paragraph before... | cybmole | Sigil | 8 | 02-24-2012 09:06 AM | 
| Common words/phrases too aggressively italicized. | carnivore | Conversion | 2 | 02-11-2011 06:36 PM | 
| Exact phrases search? Any readers with this feature? | Synergi | Which one should I buy? | 4 | 12-21-2010 12:09 PM | 
| What do need to detect a Kindle 2? | TallMomof2 | Calibre | 3 | 02-24-2009 05:00 PM | 
| Podzinger -- Searches for phrases within podcasts | Bob Russell | Lounge | 2 | 01-16-2006 04:36 PM |