02-07-2012, 07:14 AM | #1 |
Connoisseur
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
|
Unwanted Para Break
Can anyone advise/assist?
I have a number of e-pub books that have unwanted para breaks in them e.g. <p class="calibre4">‘I do not mean to steal from them,’ he said at last,</p> <p class="calibre4">‘although they have stolen life from some of us. But this time, if I am attacked... I shall fight back.’</p> I can obviously close the gap manually when I spot them. <p class="calibre4">‘I do not mean to steal from them,’ he said at last, ‘although they have stolen life from some of us. But this time, if I am attacked... I shall fight back.’</p> Is there any method using Regex that can identify lower case characters at the beginning of a line and then move the line up? |
02-07-2012, 08:03 AM | #2 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
easy with regex
find </p>\s*<p class ="calibre4">([a-z]) replace \1 NB the replace line has a space in front of \1 works with v4.2 regex, I have not upgraded to 0.5.1 yet. pretty sure there are other threads on how to do this, a varienat for example is to test end of previous line for not fullstop not quote not questionmark |
Advert | |
|
02-07-2012, 10:20 AM | #3 | |
Connoisseur
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
|
Quote:
Tried this, unfortunately it didn't work : got - the search term was replaced 0 time(s) - although I could see an example right in front of me. Specifically I put in: Find:</p>\s*<p class ="calibre4">([a-z]) Replace: \1 I am using Sigil V5.0 - is the Regex version the same? As to your variant I'm not even sure how to do that as I am virtually still a novice as far as regex is concerned - I can copy someone's else's suggestion (although not a clue as to what it means) and if it works that is great - if it doesn't I have no idea what to do |
|
02-07-2012, 10:32 AM | #4 | |
Well trained by Cats
Posts: 30,355
Karma: 58032210
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
My Wrap line Regex lightly tests the last character in the first line: ([\w",])</p>\s+<p class="calibre4">([\w"“…]) \1 \2 (there is a space between the \1 and \2) This still misses: ...Dr. Smith said... ...Mike German ... |
|
02-07-2012, 10:35 AM | #5 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
in 0.4.2. you also have to tick th box that says regular expression in the find /replace options
as to what it means - find is looking for </p> followed by any amount of whitespace ( which includes new lines) , followed by the new style, followed by a lower case letter. the replace \1 says "remember" that lower case letter you found earlier in the 1st set of ( ) - now reuse it after a space. |
Advert | |
|
02-07-2012, 10:37 AM | #6 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
the belt & braces approach ( & there's definitely a big thread on this somewhere ) is to run 2 tests.
1. for lines that begin with lower case 2. fot lines which end with " not the end of a sentence" tricky cases are e.g. Dr. Roberts left the room and then I followed him also, beware of mangling any poetry or book extracts which are not using normal punctuation rules see also https://www.mobileread.com/forums/sho...d.php?t=114931 Last edited by cybmole; 02-07-2012 at 10:41 AM. |
02-07-2012, 11:25 AM | #7 |
Connoisseur
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
|
Thanks guys for all your advice - I am slowly getting to grips with it
I have now found some other threads relating to this (thanks Cybmole for the link) I really should have looed harder but when you put in the search box and it comes up with 500 hits it kinda puts you off. I am trying each Regex Expression on a test file to see the results and will then use the ones that work best for me. But again many thanks it has been a great help. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
unwanted page break | photoluminations | Conversion | 1 | 09-28-2011 08:11 PM |
unwanted extra chapter break | alansplace | Conversion | 5 | 07-05-2011 12:00 PM |
no indent for 1st para - lost in conversion | cybmole | Conversion | 13 | 03-14-2011 10:14 AM |
another no indent for 1st para question | cybmole | Sigil | 18 | 03-12-2011 04:24 PM |
epub - unwanted page break after image | mrmikel | Calibre | 5 | 07-10-2009 07:35 AM |