12-22-2010, 06:13 PM | #16 | |
Enthusiast
Posts: 30
Karma: 10
Join Date: Dec 2010
Device: PRS-650 ... ipad
|
unwanted linebreaks
Quote:
if you are living in ne countries, cleaning up text in your own language... don't forget to put additional chars - words may ending with it , in the formula!!! i.e. german (ß) ([a-zß])</p>\s+<p class="calibre2"> but if you understood what happens here, you know already. don't you? |
|
12-22-2010, 08:14 PM | #17 | ||
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Quote:
Quote:
Code:
([a-zäëïöüàèìòùáćéíóńśúâêîôûçąężı,:)\IA\u00DF]|(?<!\&\w{4});) Code:
(?<=.{85}([a-zäëïöüàèìòùáćéíóńśúâêîôûçąężı,:)\IA\u00DF]|(?<!\&\w{4});))\s*</(span|p|div)>\s*(</(p|span|div)>)?\s*(?P<up2threeblanks><(p|span|div)[^>]*>\s*(<(p|span|div)[^>]*>\s*</(span|p|div)>\s*)</(span|p|div)>\s*){0,3}\s*<(span|div|p)[^>]*>\s*(<(span|div|p)[^>]*>)?\s* Last edited by ldolse; 12-22-2010 at 08:17 PM. |
||
12-23-2010, 04:21 AM | #18 |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
|
12-23-2010, 06:13 AM | #19 |
Created Sigil, FlightCrew
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
Instead of building a character class yourself, how about using "\w"? That will match any unicode letter, number and underscore.
|
12-23-2010, 07:14 AM | #20 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
I'm not a regex guru by any means but a number of the expressions we have been looking at are intentionally excluding the uppercase versions of characters which \w would include.
|
12-23-2010, 07:35 AM | #21 |
Created Sigil, FlightCrew
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
|
12-23-2010, 09:47 AM | #22 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
|
12-23-2010, 09:50 AM | #23 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Requirement might be a bit strong, just balancing between false positives and false negatives... I generally try to err on false negatives, since they're easier to detect later, but sometimes I think it might be easier to use \w since line length is in there as an extra check, but haven't made that jump.
|
12-26-2010, 11:04 AM | #24 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
this thread is very useful. I'd been fixing up line breaks manually in Word, which took ages.
I am amazed how easy it can be via regex. I use variations of the code given earlier in this thread i.e. What you want to do is something like this: Find: ([a-z])</p>\s+<p class="calibre2"> Replace: \1 changing calibre2 as needed on a per book basis - sometimes it needs a different 1 or 2 digit number like calibre13 one that still slips though the test though, is when a sentence has split such that the new line starts with the one letter word capital I. can that also be caught via regex ? |
01-05-2011, 06:39 AM | #25 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
I look at regex syntax reference page here
http://www.regular-expressions.info/reference.html but there's no definition of the \1 operation ? do I need a better reference page or book ??? what's a good book to learn from ( intending to use more regex in both sigil and calibre ). I'd prefer an ebook that I can put onto my kindle. |
01-05-2011, 07:34 AM | #26 | |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
http://www.regular-expressions.info/refadv.html I can't name any books, perhaps someone else can. That website is very good, and between the basic and advanced page you have the "cheat sheet" for most of what you need to know to refresh your memory. |
|
01-07-2011, 03:48 AM | #27 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
|
Tags |
find, html code, regex, replace, source view |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Request: Adding linebreaks in sidebar window. | svenlind | Calibre | 5 | 04-14-2010 03:46 AM |
Chapters showing unwanted pagebreaks and < h1 > text | raltman | Calibre | 2 | 10-05-2009 04:50 PM |
PDF reformatting help. | Ham88 | Workshop | 1 | 05-14-2009 03:07 PM |
Using Acrobat for reformatting to e-readers | snowgoose | 8 | 02-04-2009 08:13 PM | |
Reformatting untidy text files macro | 46137 | Workshop | 8 | 05-02-2008 09:27 PM |