|  12-22-2010, 06:13 PM | #16 | |
| Enthusiast  Posts: 30 Karma: 10 Join Date: Dec 2010 Device: PRS-650 ... ipad |  unwanted linebreaks Quote: 
 if you are living in ne countries, cleaning up text in your own language... don't forget to put additional chars - words may ending with it , in the formula!!! i.e. german (ß) ([a-zß])</p>\s+<p class="calibre2"> but if you understood what happens here, you know already. don't you? | |
|   |   | 
|  12-22-2010, 08:14 PM | #17 | ||
| Wizard            Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone | Quote: 
 Quote: 
 Code: ([a-zäëïöüàèìòùáćéíóńśúâêîôûçąężı,:)\IA\u00DF]|(?<!\&\w{4});)Code: (?<=.{85}([a-zäëïöüàèìòùáćéíóńśúâêîôûçąężı,:)\IA\u00DF]|(?<!\&\w{4});))\s*</(span|p|div)>\s*(</(p|span|div)>)?\s*(?P<up2threeblanks><(p|span|div)[^>]*>\s*(<(p|span|div)[^>]*>\s*</(span|p|div)>\s*)</(span|p|div)>\s*){0,3}\s*<(span|div|p)[^>]*>\s*(<(span|div|p)[^>]*>)?\s*Last edited by ldolse; 12-22-2010 at 08:17 PM. | ||
|   |   | 
|  12-23-2010, 04:21 AM | #18 | 
| frumious Bandersnatch            Posts: 7,570 Karma: 20150435 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura | |
|   |   | 
|  12-23-2010, 06:13 AM | #19 | 
| Created Sigil, FlightCrew            Posts: 1,982 Karma: 350515 Join Date: Feb 2008 Device: Kobo Clara HD | 
			
			Instead of building a character class yourself, how about using "\w"? That will match any unicode letter, number and underscore.
		 | 
|   |   | 
|  12-23-2010, 07:14 AM | #20 | 
| Calibre Plugins Developer            Posts: 4,735 Karma: 2197770 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis | 
			
			I'm not a regex guru by any means but a number of the expressions we have been looking at are intentionally excluding the uppercase versions of characters which \w would include.
		 | 
|   |   | 
|  12-23-2010, 07:35 AM | #21 | 
| Created Sigil, FlightCrew            Posts: 1,982 Karma: 350515 Join Date: Feb 2008 Device: Kobo Clara HD | |
|   |   | 
|  12-23-2010, 09:47 AM | #22 | 
| Wizard            Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone | |
|   |   | 
|  12-23-2010, 09:50 AM | #23 | 
| Wizard            Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone | 
			
			Requirement might be a bit strong, just balancing between false positives and false negatives...  I generally try to err on false negatives, since they're easier to detect later, but sometimes I think it might be easier to use \w since line length is in there as an extra check, but haven't made that jump.
		 | 
|   |   | 
|  12-26-2010, 11:04 AM | #24 | 
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | 
			
			this thread is very useful. I'd been fixing up line breaks manually in Word, which took ages. I am amazed how easy it can be via regex. I use variations of the code given earlier in this thread i.e. What you want to do is something like this: Find: ([a-z])</p>\s+<p class="calibre2"> Replace: \1 changing calibre2 as needed on a per book basis - sometimes it needs a different 1 or 2 digit number like calibre13 one that still slips though the test though, is when a sentence has split such that the new line starts with the one letter word capital I. can that also be caught via regex ? | 
|   |   | 
|  01-05-2011, 06:39 AM | #25 | 
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | 
			
			 I look at regex syntax reference page here http://www.regular-expressions.info/reference.html but there's no definition of the \1 operation ? do I need a better reference page or book ??? what's a good book to learn from ( intending to use more regex in both sigil and calibre ). I'd prefer an ebook that I can put onto my kindle. | 
|   |   | 
|  01-05-2011, 07:34 AM | #26 | |
| Calibre Plugins Developer            Posts: 4,735 Karma: 2197770 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis | Quote: 
 http://www.regular-expressions.info/refadv.html I can't name any books, perhaps someone else can. That website is very good, and between the basic and advanced page you have the "cheat sheet" for most of what you need to know to refresh your memory. | |
|   |   | 
|  01-07-2011, 03:48 AM | #27 | 
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | |
|   |   | 
|  | 
| Tags | 
| find, html code, regex, replace, source view | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Request: Adding linebreaks in sidebar window. | svenlind | Calibre | 5 | 04-14-2010 03:46 AM | 
| Chapters showing unwanted pagebreaks and < h1 > text | raltman | Calibre | 2 | 10-05-2009 04:50 PM | 
| PDF reformatting help. | Ham88 | Workshop | 1 | 05-14-2009 03:07 PM | 
| Using Acrobat for reformatting to e-readers | snowgoose | 8 | 02-04-2009 08:13 PM | |
| Reformatting untidy text files macro | 46137 | Workshop | 8 | 05-02-2008 09:27 PM |