![]() |
#1 |
e-Bibliophile
![]() Posts: 60
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
|
Using search&replace for blank lines.
There's an issue I've run into that I can't understand. It *may* be a bug, but it may be something else.
I am trying to replace blank lines in an document using the Search & Replace function. I did some research in the forums and online and found a function that will work for what I want, with a few minor modifications on my part. I even used the wizard to make sure it was working. The code is as follows: Code:
<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*\/?>\s*<\/\1\s*> or <(p[^>]*|span[^>]*)>(\s| |</?\s?br\s?/?>)*</?(p|span)> The code it should be replace is: Code:
<p class="calibre5"> </p> or <p class="calibre5"></p> or <p class="calibre5"><span class="calibre4"> </span></p> Of note, just in case, Heuristic Processing is *not* on. Using Heuristic Processing For the first 2 examples, it works fine. However, if there's a span tag (like the third example) then it doesn't strip out the 'blank' line. Anyway, the regex should get rid of all of the above. In the case of the last one, it should, at minimum remove the span tag and then I could rerun it, or set a second scan to remove the empty P tag. (I've tried it both ways). The replace doesn't seem to work, at all. Despite the fact that it matches them when tested, when I look at the code of the new epub file made, there's no change with the tag they all remain the same. Is there something I'm doing wrong? If needed I can provide an example epub. It was initially downloaded with Fanficfare, and it has already been converted, epub (to) epub once, that's why it has the calibre tags. I've made a test epub by stripping it down to almost nothing in the chapters just enough to test the regex. I can provide it, if needed. Last edited by mehetabelo; 04-14-2017 at 07:06 PM. Reason: fixed some possible misunderstandings. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,359
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
A test file is always helpful.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
e-Bibliophile
![]() Posts: 60
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
|
I uploaded it to zippyshare if that's acceptable.
Zippyshare I included both the current epub and the original, the one made I stripped down prior to the test run (on this particular file). So the .epub is the one I ran with the regex previously mentioned. |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,359
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Your problems are almost certainly caused by the non-breaking space -- you cannot match it with as the processing pipeline converts it to the unicode character. Use \u00a0 instead
|
![]() |
![]() |
![]() |
#5 |
e-Bibliophile
![]() Posts: 60
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
|
I just tried it with both:
Code:
<(p[^>]*|span[^>]*)>(\s|\u00a0|</?\s?br\s?/?>)*</?(p|span)> then <(p[^>]*|span[^>]*)>(\s|</?\s?br\s?/?>)*</?(p|span)> Last edited by mehetabelo; 04-15-2017 at 10:13 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,359
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Works for me with
<p[^>]*><span[^>]*>.</span></p> or to match only a nbsp <p[^>]*><span[^>]*> </span></p> where there is a literal nbsp between the span tags (dont copy paste the expression above as MR has trouble with literal nbsp characters) |
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,359
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Oh and if you want to use \s to match nbsp characters, use
Code:
(?u)<p[^>]*><span[^>]*>\s</span></p> |
![]() |
![]() |
![]() |
#8 |
e-Bibliophile
![]() Posts: 60
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
|
That worked well... I made a few adaptions, but it was close enough to get me where I wanted to be. I wonder why the initial regex didn't work, even though it matched when I checked it?
Anyway, I know you have a busy schedule. I didn't actually expect you to be the one to answer the questions the whole time. I truly appreciate the time you spend helping, and the enormous amount of time you've spent working on the program. It is an amazing piece of work and is software I literally use daily. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex in search problems (NOT Search&Replace; the search bar) | lairdb | Calibre | 3 | 03-15-2017 07:10 PM |
Aura One: *#&^%B blank lines between paragraphs | franklekens | Kobo Reader | 14 | 09-14-2016 03:28 PM |
Search & Replace Help | paulfiera | Conversion | 7 | 08-06-2015 03:52 AM |
Blank lines & top margins | travger | Kindle Formats | 11 | 10-08-2012 08:35 AM |
FB Reader version & blank lines | franklekens | PocketBook | 2 | 03-01-2010 04:38 AM |