![]() |
#1 |
Connoisseur
![]() Posts: 59
Karma: 10
Join Date: Apr 2012
Device: Kindle Fire
|
Search regex problem
My wife has a number of books that have masses of "Extra" carriage returns inserted in them making reading difficult.
A sample is shown below Code:
<p class="calibre1">And Alandra looking at him as he came into the room, found that </p> <p class="calibre1">although she had not wanted to allow her grandfather one courtesy, </p> <p class="calibre1">she was getting to her feet. </p> <p class="calibre1">Silently, she watched and waited as he came closer. And when he </p> <p class="calibre1">stopped and for long seconds stared at her, she saw deep frown lines </p> <p class="calibre1">groove on his forehead. But she had no word to say to him, and he </p> <p class="calibre1">none for her as he turned to the man who, keeping his eyes steady on </p> <p class="calibre1">the two of them, had now moved from his position by the door, and </p> <p class="calibre1">was coming in their direction. </p> <p class="calibre1">And it was left to Matt Carstairs to introduce the two—the elderly </p> <p class="calibre1">man who still had the gait of a man years younger, and the young </p> <p class="calibre1">woman whose solemn face was giving nothing away of the very low </p> <p class="calibre1">regard in which she held the other. </p> <p class="calibre1">'This,' said Matt Carstairs, pausing only marginally as if to assess </p> <p class="calibre1">how the older man would take it, 'this woman claims to be your </p> <p class="calibre1">granddaughter, sir—she says she is Edward's child.' </p> Code:
(?<![".!?>*”“…~’])</(?P<tag>\w+)>\s*<(?P=tag) [^/>]+> Replaced with "Null" Is there any way to amend this search to exclude the more obvious "Genuine" carriage returns. I know that there are other ways to end a sentence other than full stops, and that any such search will not catch everything, and will get some wrong. But it would be better than what she currently has. Thanks Last edited by ColMac; 04-15-2015 at 10:12 AM. Reason: spelling |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Search instead for lines that begin with a lower case character and strip the opening tag.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
And also the previous closing tag. I can't face typing the code via my tablet but there will be examples in older threads
|
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
ok at my PC now here is how I'd do it with Sigil
1. highlight a relevant fragement e.g. this from your example </p> <p class="calibre1">h 2. paste that into the find part of the regex. (pasting it as-is will take care of the space between lines issue. ) 3. now replace that closing h with ([a-z]) so that it matches any lower case the replace string for the regex is (blank space)\1 what that all does is removes the closing tag + the next opening tab & 1st letter, and then puts back the initial opening letter, preceded by a blank space. test it carefully and make a backup before you "replace all" ! outside of poetry and titles, there's no valid reason for a paragraph to begin with lower case, so that will fix most issues. you can still get awkward cases like then he said, "this" but go can extrapolate code as needed for those |
![]() |
![]() |
![]() |
#5 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,913
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
</p>\s+<p
will take care of any Indention or CR (that may vary in the document due to other tags like Blockquotes and Div) |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Connoisseur
![]() Posts: 59
Karma: 10
Join Date: Apr 2012
Device: Kindle Fire
|
Search regex problem
Quote:
First one I tried found almost 4,000 occurrences. I'm pretty sure that there may be an error in that 4,000, but it is a massive improvement on what I had before. Thanks for the help. |
|
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() Posts: 59
Karma: 10
Join Date: Apr 2012
Device: Kindle Fire
|
Search regex problem
|
![]() |
![]() |
![]() |
#8 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
I used to have a file with several of these all tested and vetted but I can't find it. and I could not face re-reading all of the regex examples in the sigil forum stickies I'm sure there are detailed old threads but writing the appropirate search expression for the forum search engine has me beat |
|
![]() |
![]() |
![]() |
#9 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,913
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
\s is whitespace of any kind (space, return, tab) the + is one or more (in a row of the condition)
|
![]() |
![]() |
![]() |
#10 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Sorry
OBE Last edited by phossler; 04-15-2015 at 04:32 PM. |
![]() |
![]() |
![]() |
#11 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,171
Karma: 8800000
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
|
This is one I think is from the Sigil forums, use to find and join split paras:
Search Code:
</p>\s*<p[^>]+>([a-z]) Code:
\1 Using the OP example this is the result. Spoiler:
bernie Quote:
Last edited by gbm; 04-15-2015 at 05:49 PM. |
|
![]() |
![]() |
![]() |
#12 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
And if there is no class?
|
![]() |
![]() |
![]() |
#13 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,171
Karma: 8800000
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
|
|
![]() |
![]() |
![]() |
#14 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Au contraire.
Code:
</p>\s*<p(?: [^>]+)?>([a-z]) The whole idea behind regex is to, you know, create one pattern to rule them all. ![]() |
![]() |
![]() |
![]() |
#15 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,316
Karma: 78876004
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
Personally I'd head back to whoever supplied me with those books and ask for clean versions...
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
regex search/replace - how to? | Alt68er | Sigil | 1 | 03-11-2014 08:53 PM |
Regex search details | DiapDealer | Editor | 4 | 02-22-2014 11:58 AM |
Regex search and replace | dwlamb | Sigil | 6 | 04-12-2013 02:34 PM |
regex search/replace | Sharlene | Sigil | 10 | 01-28-2012 04:14 AM |
need regex help search and replace | schuster | Calibre | 4 | 01-10-2011 09:00 AM |