Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-07-2012, 07:14 AM   #1
Paxman53
Connoisseur
Paxman53 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
Unwanted Para Break

Can anyone advise/assist?

I have a number of e-pub books that have unwanted para breaks in them

e.g.

<p class="calibre4">‘I do not mean to steal from them,’ he said at last,</p>

<p class="calibre4">‘although they have stolen life from some of us. But this time, if I am attacked... I shall fight back.’</p>

I can obviously close the gap manually when I spot them.

<p class="calibre4">‘I do not mean to steal from them,’ he said at last, ‘although they have stolen life from some of us. But this time, if I am attacked... I shall fight back.’</p>

Is there any method using Regex that can identify lower case characters at the beginning of a line and then move the line up?
Paxman53 is offline   Reply With Quote
Old 02-07-2012, 08:03 AM   #2
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
easy with regex

find </p>\s*<p class ="calibre4">([a-z])
replace \1

NB the replace line has a space in front of \1

works with v4.2 regex, I have not upgraded to 0.5.1 yet.

pretty sure there are other threads on how to do this, a varienat for example is to test end of previous line for not fullstop not quote not questionmark
cybmole is offline   Reply With Quote
Advert
Old 02-07-2012, 10:20 AM   #3
Paxman53
Connoisseur
Paxman53 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
Quote:
Originally Posted by cybmole View Post
easy with regex

find </p>\s*<p class ="calibre4">([a-z])
replace \1

NB the replace line has a space in front of \1

works with v4.2 regex, I have not upgraded to 0.5.1 yet.

pretty sure there are other threads on how to do this, a varienat for example is to test end of previous line for not fullstop not quote not questionmark

Tried this, unfortunately it didn't work : got - the search term was replaced 0 time(s) - although I could see an example right in front of me.

Specifically I put in:

Find:</p>\s*<p class ="calibre4">([a-z])
Replace: \1

I am using Sigil V5.0 - is the Regex version the same?

As to your variant I'm not even sure how to do that as I am virtually still a novice as far as regex is concerned - I can copy someone's else's suggestion (although not a clue as to what it means) and if it works that is great - if it doesn't I have no idea what to do
Paxman53 is offline   Reply With Quote
Old 02-07-2012, 10:32 AM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,773
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by cybmole View Post
easy with regex

find </p>\s*<p class ="calibre4">([a-z])
replace \1

NB the replace line has a space in front of \1

works with v4.2 regex, I have not upgraded to 0.5.1 yet.

pretty sure there are other threads on how to do this, a varienat for example is to test end of previous line for not fullstop not quote not questionmark
This will catch about 98% of them. that do not end in punctuation.

My Wrap line Regex lightly tests the last character in the first line:

([\w",])</p>\s+<p class="calibre4">([\w"“…])

\1 \2

(there is a space between the \1 and \2)

This still misses:
...Dr.
Smith said...

...Mike
German ...
theducks is offline   Reply With Quote
Old 02-07-2012, 10:35 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
in 0.4.2. you also have to tick th box that says regular expression in the find /replace options

as to what it means - find is looking for </p> followed by any amount of whitespace ( which includes new lines) , followed by the new style, followed by a lower case letter.

the replace \1 says "remember" that lower case letter you found earlier in the 1st set of ( ) - now reuse it after a space.
cybmole is offline   Reply With Quote
Advert
Old 02-07-2012, 10:37 AM   #6
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
the belt & braces approach ( & there's definitely a big thread on this somewhere ) is to run 2 tests.
1. for lines that begin with lower case
2. fot lines which end with " not the end of a sentence"

tricky cases are e.g.
Dr.
Roberts left the room and then I
followed him

also, beware of mangling any poetry or book extracts which are not using normal punctuation rules

see also
https://www.mobileread.com/forums/sho...d.php?t=114931

Last edited by cybmole; 02-07-2012 at 10:41 AM.
cybmole is offline   Reply With Quote
Old 02-07-2012, 11:25 AM   #7
Paxman53
Connoisseur
Paxman53 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
Thanks guys for all your advice - I am slowly getting to grips with it

I have now found some other threads relating to this (thanks Cybmole for the link)

I really should have looed harder but when you put in the search box and it comes up with 500 hits it kinda puts you off.

I am trying each Regex Expression on a test file to see the results and will then use the ones that work best for me.

But again many thanks it has been a great help.
Paxman53 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
unwanted page break photoluminations Conversion 1 09-28-2011 08:11 PM
unwanted extra chapter break alansplace Conversion 5 07-05-2011 12:00 PM
no indent for 1st para - lost in conversion cybmole Conversion 13 03-14-2011 10:14 AM
another no indent for 1st para question cybmole Sigil 18 03-12-2011 04:24 PM
epub - unwanted page break after image mrmikel Calibre 5 07-10-2009 07:35 AM


All times are GMT -4. The time now is 11:50 AM.


MobileRead.com is a privately owned, operated and funded community.