Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 04-15-2015, 10:04 AM   #1
ColMac
Connoisseur
ColMac began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Apr 2012
Device: Kindle Fire
Search regex problem

My wife has a number of books that have masses of "Extra" carriage returns inserted in them making reading difficult.

A sample is shown below


Code:
<p class="calibre1">And  Alandra  looking  at  him  as  he  came  into  the  room,  found  that </p>
<p class="calibre1">although  she  had  not  wanted  to  allow  her  grandfather  one  courtesy, </p>
<p class="calibre1">she was getting to her feet. </p>
<p class="calibre1">Silently,  she  watched  and  waited  as  he  came  closer.  And  when  he </p>
<p class="calibre1">stopped and for long seconds stared at her, she saw deep frown lines </p>
<p class="calibre1">groove  on  his  forehead.  But  she  had  no  word  to  say  to  him,  and  he </p>
<p class="calibre1">none for her as he turned to the man who, keeping his eyes steady on </p>
<p class="calibre1">the two of them, had now moved from his position by the door, and </p>
<p class="calibre1">was coming in their direction. </p>
<p class="calibre1">And  it  was  left  to  Matt  Carstairs  to  introduce  the  two—the  elderly </p>
<p class="calibre1">man  who  still  had  the  gait  of  a  man  years  younger,  and  the  young </p>
<p class="calibre1">woman whose solemn face was giving nothing away of the very low </p>
<p class="calibre1">regard in which she held the other. </p>
<p class="calibre1">'This,'  said  Matt  Carstairs,  pausing  only  marginally  as  if  to  assess </p>
<p class="calibre1">how  the  older  man  would  take  it,  'this  woman  claims  to  be  your </p>
<p class="calibre1">granddaughter, sir—she says she is Edward's child.' </p>
I managed to find a saved search from Zajora that partly solves the problem

Code:
(?<![".!?>*”“…~’])</(?P<tag>\w+)>\s*<(?P=tag) [^/>]+>

Replaced with "Null"
However, it also removes the genuine end of paragraph returns (In the example above, the ones with full stops and single quote).

Is there any way to amend this search to exclude the more obvious "Genuine" carriage returns. I know that there are other ways to end a sentence other than full stops, and that any such search will not catch everything, and will get some wrong. But it would be better than what she currently has.

Thanks

Last edited by ColMac; 04-15-2015 at 10:12 AM. Reason: spelling
ColMac is offline   Reply With Quote
Old 04-15-2015, 10:57 AM   #2
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Search instead for lines that begin with a lower case character and strip the opening tag.
cybmole is offline   Reply With Quote
Advert
Old 04-15-2015, 10:59 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
And also the previous closing tag. I can't face typing the code via my tablet but there will be examples in older threads
cybmole is offline   Reply With Quote
Old 04-15-2015, 11:16 AM   #4
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
ok at my PC now here is how I'd do it with Sigil
1. highlight a relevant fragement e.g. this from your example
</p>
<p class="calibre1">h

2. paste that into the find part of the regex. (pasting it as-is will take care of the space between lines issue. )
3. now replace that closing h with ([a-z]) so that it matches any lower case
the replace string for the regex is (blank space)\1

what that all does is removes the closing tag + the next opening tab & 1st letter, and then puts back the initial opening letter, preceded by a blank space.
test it carefully and make a backup before you "replace all" !

outside of poetry and titles, there's no valid reason for a paragraph to begin with lower case, so that will fix most issues.

you can still get awkward cases like
then he said,
"this"


but go can extrapolate code as needed for those
cybmole is offline   Reply With Quote
Old 04-15-2015, 11:37 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,913
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
</p>\s+<p
will take care of any Indention or CR (that may vary in the document due to other tags like Blockquotes and Div)
theducks is offline   Reply With Quote
Advert
Old 04-15-2015, 12:15 PM   #6
ColMac
Connoisseur
ColMac began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Apr 2012
Device: Kindle Fire
Search regex problem

Quote:
Originally Posted by cybmole View Post
ok at my PC now here is how I'd do it with Sigil
1. highlight a relevant fragement e.g. this from your example
</p>
<p class="calibre1">h

2. paste that into the find part of the regex. (pasting it as-is will take care of the space between lines issue. )
3. now replace that closing h with ([a-z]) so that it matches any lower case
the replace string for the regex is (blank space)\1
That works perfectly.

First one I tried found almost 4,000 occurrences. I'm pretty sure that there may be an error in that 4,000, but it is a massive improvement on what I had before.

Thanks for the help.
ColMac is offline   Reply With Quote
Old 04-15-2015, 12:18 PM   #7
ColMac
Connoisseur
ColMac began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Apr 2012
Device: Kindle Fire
Search regex problem

Quote:
Originally Posted by theducks View Post
</p>\s+<p
will take care of any Indention or CR
Not sure how this one is meant. Is this an addition to the code from cybmole, or instead of.

I tried using it as is, but wasn't sure it was giving me valid results

Colin
ColMac is offline   Reply With Quote
Old 04-15-2015, 01:15 PM   #8
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by ColMac View Post
Not sure how this one is meant. Is this an addition to the code from cybmole, or instead of.

I tried using it as is, but wasn't sure it was giving me valid results

Colin
that last one is the "proper" way to detect & remove the blank stuff between end of one line and start of next. my lazy way is to copy paste the whole shebang into the find field & it usually works!



I used to have a file with several of these all tested and vetted but I can't find it. and I could not face re-reading all of the regex examples in the sigil forum stickies

I'm sure there are detailed old threads but writing the appropirate search expression for the forum search engine has me beat
cybmole is offline   Reply With Quote
Old 04-15-2015, 02:00 PM   #9
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,913
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
\s is whitespace of any kind (space, return, tab) the + is one or more (in a row of the condition)
theducks is offline   Reply With Quote
Old 04-15-2015, 04:22 PM   #10
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Sorry

OBE

Last edited by phossler; 04-15-2015 at 04:32 PM.
phossler is offline   Reply With Quote
Old 04-15-2015, 05:36 PM   #11
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,171
Karma: 8800000
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
This is one I think is from the Sigil forums, use to find and join split paras:

Search
Code:
</p>\s*<p[^>]+>([a-z])
replace
Code:
\1
Have it as a saved search in ebook editor.
Using the OP example this is the result.
Spoiler:
<p class="calibre1">And Alandra looking at him as he came into the room, found that although she had not wanted to allow her grandfather one courtesy, she was getting to her feet. </p>
<p class="calibre1">Silently, she watched and waited as he came closer. And when he stopped and for long seconds stared at her, she saw deep frown lines groove on his forehead. But she had no word to say to him, and he none for her as he turned to the man who, keeping his eyes steady on the two of them, had now moved from his position by the door, and was coming in their direction. </p>
<p class="calibre1">And it was left to Matt Carstairs to introduce the two—the elderly man who still had the gait of a man years younger, and the young woman whose solemn face was giving nothing away of the very low regard in which she held the other. </p>
<p class="calibre1">'This,' said Matt Carstairs, pausing only marginally as if to assess how the older man would take it, 'this woman claims to be your granddaughter, sir—she says she is Edward's child.' </p>

bernie
Quote:
Originally Posted by cybmole View Post
that last one is the "proper" way to detect & remove the blank stuff between end of one line and start of next. my lazy way is to copy paste the whole shebang into the find field & it usually works!



I used to have a file with several of these all tested and vetted but I can't find it. and I could not face re-reading all of the regex examples in the sigil forum stickies

I'm sure there are detailed old threads but writing the appropirate search expression for the forum search engine has me beat

Last edited by gbm; 04-15-2015 at 05:49 PM.
gbm is offline   Reply With Quote
Old 04-15-2015, 08:34 PM   #12
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
And if there is no class?
eschwartz is offline   Reply With Quote
Old 04-15-2015, 08:47 PM   #13
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,171
Karma: 8800000
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by eschwartz View Post
And if there is no class?
Code:
</p>\s*<p>+([a-z])
bernie
gbm is offline   Reply With Quote
Old 04-15-2015, 11:37 PM   #14
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Au contraire.

Code:
</p>\s*<p(?: [^>]+)?>([a-z])
I was hoping you'd notice the previous regex didn't cover all cases... and fix that, rather than create another problem.
The whole idea behind regex is to, you know, create one pattern to rule them all.
eschwartz is offline   Reply With Quote
Old 04-15-2015, 11:43 PM   #15
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
Posts: 13,316
Karma: 78876004
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
Personally I'd head back to whoever supplied me with those books and ask for clean versions...
PeterT is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
regex search/replace - how to? Alt68er Sigil 1 03-11-2014 08:53 PM
Regex search details DiapDealer Editor 4 02-22-2014 11:58 AM
Regex search and replace dwlamb Sigil 6 04-12-2013 02:34 PM
regex search/replace Sharlene Sigil 10 01-28-2012 04:14 AM
need regex help search and replace schuster Calibre 4 01-10-2011 09:00 AM


All times are GMT -4. The time now is 09:13 AM.


MobileRead.com is a privately owned, operated and funded community.