Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-14-2017, 08:04 PM   #1
mehetabelo
e-Bibliophile
mehetabelo began at the beginning.
 
mehetabelo's Avatar
 
Posts: 60
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
Using search&replace for blank lines.

There's an issue I've run into that I can't understand. It *may* be a bug, but it may be something else.

I am trying to replace blank lines in an document using the Search & Replace function. I did some research in the forums and online and found a function that will work for what I want, with a few minor modifications on my part. I even used the wizard to make sure it was working. The code is as follows:

Code:
<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*\/?>\s*<\/\1\s*>
or
<(p[^>]*|span[^>]*)>(\s|&nbsp;|</?\s?br\s?/?>)*</?(p|span)>
When I have it input, and then do a 'test'

The code it should be replace is:
Code:
<p class="calibre5"> </p>
or 
<p class="calibre5"></p>
or
<p class="calibre5"><span class="calibre4"> </span></p>
They both work and find numerous matches in the particular document I'm looking at. The first one caught metadata tags, so I abandoned it for the second, just throwing it in there for thoughts.

Of note, just in case, Heuristic Processing is *not* on. Using Heuristic Processing For the first 2 examples, it works fine. However, if there's a span tag (like the third example) then it doesn't strip out the 'blank' line.

Anyway, the regex should get rid of all of the above. In the case of the last one, it should, at minimum remove the span tag and then I could rerun it, or set a second scan to remove the empty P tag. (I've tried it both ways).

The replace doesn't seem to work, at all. Despite the fact that it matches them when tested, when I look at the code of the new epub file made, there's no change with the tag they all remain the same. Is there something I'm doing wrong? If needed I can provide an example epub. It was initially downloaded with Fanficfare, and it has already been converted, epub (to) epub once, that's why it has the calibre tags.

I've made a test epub by stripping it down to almost nothing in the chapters just enough to test the regex. I can provide it, if needed.

Last edited by mehetabelo; 04-14-2017 at 08:06 PM. Reason: fixed some possible misunderstandings.
mehetabelo is offline   Reply With Quote
Advert
Old 04-14-2017, 11:03 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 31,854
Karma: 8697710
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
A test file is always helpful.
kovidgoyal is online now   Reply With Quote
Old 04-15-2017, 12:04 AM   #3
mehetabelo
e-Bibliophile
mehetabelo began at the beginning.
 
mehetabelo's Avatar
 
Posts: 60
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
I uploaded it to zippyshare if that's acceptable.

Zippyshare

I included both the current epub and the original, the one made I stripped down prior to the test run (on this particular file). So the .epub is the one I ran with the regex previously mentioned.
mehetabelo is offline   Reply With Quote
Old 04-15-2017, 12:14 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 31,854
Karma: 8697710
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Your problems are almost certainly caused by the non-breaking space -- you cannot match it with &nbsp; as the processing pipeline converts it to the unicode character. Use \u00a0 instead
kovidgoyal is online now   Reply With Quote
Old 04-15-2017, 11:09 PM   #5
mehetabelo
e-Bibliophile
mehetabelo began at the beginning.
 
mehetabelo's Avatar
 
Posts: 60
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
I just tried it with both:
Code:
<(p[^>]*|span[^>]*)>(\s|\u00a0|</?\s?br\s?/?>)*</?(p|span)>
then
<(p[^>]*|span[^>]*)>(\s|</?\s?br\s?/?>)*</?(p|span)>
The second was to remove &nbsp; completely. neither one worked. The empty tags still remain in the epub after conversion.

Last edited by mehetabelo; 04-15-2017 at 11:13 PM.
mehetabelo is offline   Reply With Quote
Advert
Old 04-16-2017, 03:17 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 31,854
Karma: 8697710
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Works for me with
<p[^>]*><span[^>]*>.</span></p>
or to match only a nbsp
<p[^>]*><span[^>]*> </span></p>
where there is a literal nbsp between the span tags (dont copy paste the expression above as MR has trouble with literal nbsp characters)
kovidgoyal is online now   Reply With Quote
Old 04-16-2017, 03:34 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 31,854
Karma: 8697710
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Oh and if you want to use \s to match nbsp characters, use

Code:
(?u)<p[^>]*><span[^>]*>\s</span></p>
kovidgoyal is online now   Reply With Quote
Old 04-16-2017, 05:58 PM   #8
mehetabelo
e-Bibliophile
mehetabelo began at the beginning.
 
mehetabelo's Avatar
 
Posts: 60
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
That worked well... I made a few adaptions, but it was close enough to get me where I wanted to be. I wonder why the initial regex didn't work, even though it matched when I checked it?

Anyway, I know you have a busy schedule. I didn't actually expect you to be the one to answer the questions the whole time. I truly appreciate the time you spend helping, and the enormous amount of time you've spent working on the program. It is an amazing piece of work and is software I literally use daily.
mehetabelo is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex in search problems (NOT Search&Replace; the search bar) lairdb Calibre 3 03-15-2017 08:10 PM
Aura One: *#&^%B blank lines between paragraphs franklekens Kobo Reader 14 09-14-2016 04:28 PM
Search & Replace Help paulfiera Conversion 7 08-06-2015 04:52 AM
Blank lines & top margins travger Kindle Formats 11 10-08-2012 09:35 AM
FB Reader version & blank lines franklekens PocketBook 2 03-01-2010 05:38 AM


All times are GMT -4. The time now is 12:18 AM.


MobileRead.com is a privately owned, operated and funded community.