Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 08-09-2014, 01:51 AM   #16
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by DiapDealer View Post
I don't follow you. The href="whatever" bit is part of the 'a' tag. you can't strip the 'a' tags without removing the href. Just try the regex and you'll see exactly what it will remove. There's no need to wonder.
ok - lets see. I create a 1 line book in sigil using this previously posted example:
<h1 class="calibre10" id="rw-h1_319849-00001"><a class="calibre7" href="../Text/9780857900135_toc.html">4</a></h1>

now I run your (sigil flavored) regex- you are right - it works !

so can you walk me though HOW it works, please -using the above example

I am impressed that it zaps both eth opening and the closing tag, in a single pass, and without needing a \1 replace anywhere

PS re the concern that I may over-zealously zap too much stuff:
My usual precaution in sigil is to run count all, to begin with; if that returns a count that matches the number of chapters, then clearly I have no instances outside of chapter headers to worry about & I can run replace all

Last edited by cybmole; 08-09-2014 at 01:54 AM.
cybmole is offline   Reply With Quote
Old 08-09-2014, 02:54 AM   #17
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,400
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by cybmole View Post
I am impressed that it zaps both eth opening and the closing tag, in a single pass, and without needing a \1 replace anywhere
That's part of its simplicity. It's designed to match both the opening and closing 'a' tags themselves, rather than capturing the text in between them and trying to separate that and put it back with a \1.

Quote:
Originally Posted by cybmole View Post
so can you walk me though HOW it works, please -using the above example.

Certainly. It's all about the optional elements (indicated by the '?'s).
Code:
</?a ?([^>]+)?>
Take the opening portion:
Code:
</?a
The /? makes the slash optional. So that means it matches both the <a of the opening tag and the </a of the closing tag.

The space that follows is for demarcation so it doesn't match any other tags that might start with the letter 'a' (addr abbr, area, etc...). It's made optional with the following '?' because the space won't exist in the closing tag.
(NOTE: I can't guarantee it won't match tags like addr, abbr, or area because I frankly haven't tried it--I suspect it might. But those tags are pretty rare. Still ... that's why I prefer the \M approach instead of the " ?". "a\M" matches the letter a at the "end of a word." But \M won't work in all flavors of regex.)

That takes us through
Code:
</?a ?
The [^>] part just means "any character that's not (^) a closing angle brace (>)". The '+' is to indicate one or more repetitions of that "any character that's not a closing angle brace". It's basically a way to capture anything up to the closing angle brace, while ensuring it doesn't get "greedy" and go beyond the next closest angle brace.
Code:
[^>]+
Wrapping the [^>]+ in parentheses just groups it together so that the following question mark makes the entire construct optional (because it won't exist in the closing tag).
Code:
([^>]+)?


So put it all together and it will match </a> as well as:
Code:
<a id="blah" class="blahdeblah" href="blahdedblahdeblah.html#doohickey">
Basically anything that starts with '<a' or '</a' and everything else that may be present, up to and including the next '>'.

Last edited by DiapDealer; 08-09-2014 at 03:03 AM.
DiapDealer is offline   Reply With Quote
Advert
Old 08-09-2014, 03:14 AM   #18
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
so simple when you know how

many thanks for that excellent walkthrough
cybmole is offline   Reply With Quote
Old 08-09-2014, 03:27 AM   #19
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,223
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by cybmole View Post
been there , tried that.
for your viewer, yes but for ade based readers-
They decide that they know best & that if its a link its going to render as blue+ underlined, no matter what you put in the CSS
The only place you have to use ade based readers is on eink devices, and they dont support colors anyway.
kovidgoyal is offline   Reply With Quote
Old 08-09-2014, 03:28 AM   #20
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,400
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by cybmole View Post
so simple when you know how

many thanks for that excellent walkthrough
You're quite welcome.

Last edited by DiapDealer; 08-09-2014 at 03:31 AM.
DiapDealer is offline   Reply With Quote
Advert
Old 08-09-2014, 03:36 AM   #21
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,223
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
And I just tried overriding the styles like this

a:link { color: magenta; text-decoration: none }

and it worked fine in my copy of ADE 1.7
kovidgoyal is offline   Reply With Quote
Old 08-09-2014, 04:02 AM   #22
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,259
Karma: 145488788
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
If the code is <a href="page_10"/> then that's easy to remove.

Search for <a href="page_[0-9]*"/> and replace with nothing.
JSWolf is offline   Reply With Quote
Old 08-09-2014, 04:35 AM   #23
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by kovidgoyal View Post
And I just tried overriding the styles like this

a:link { color: magenta; text-decoration: none }

and it worked fine in my copy of ADE 1.7
i think I said ade based readers, not ade PC version ?

for sure, on the sony readers, I still saw blue+ underlined.

but I did not do it your way, I just styled the h2 tag or whatever that tag containing the <a bit was, maybe that's why,

I still prefer to remove them though so that i do not create dead links by removing an unwanted html TOC page, which I consider to be a redundant item.
On any reader i'd use the show me the chapters feature & that would refer to the toc.ncx file. I don't see any added value in keeping an active-links HTML contents page either at the start or at the end of an epub.
cybmole is offline   Reply With Quote
Old 08-09-2014, 05:36 AM   #24
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,223
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
ade 1.7 is what is used on sony readers. And styling the element surrounding a link will not work, because the link's css will override it.
kovidgoyal is offline   Reply With Quote
Old 08-09-2014, 06:15 AM   #25
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by kovidgoyal View Post
ade 1.7 is what is used on sony readers. And styling the element surrounding a link will not work, because the link's css will override it.
OK thanks for clarifying. I have moved on from Sony hardware now anyway. It is possible that I see similar on Kobo reader but now I know what to style if needs be.

given the easy to use tag removal regex solutions posted here by others, I guess there's less of a case for wanting an editor or a convert feature to do the job for me.
cybmole is offline   Reply With Quote
Old 08-09-2014, 08:05 AM   #26
timberbeast
stumblebum
timberbeast began at the beginning.
 
timberbeast's Avatar
 
Posts: 29
Karma: 10
Join Date: Nov 2013
Location: Roseburg, OR
Device: kindle2
Quote:
so simple when you know how

many thanks for that excellent walkthrough
@DiapDealer Yes, I found the walk through to be quite helpful also.

Back to lurking.

larry

Last edited by timberbeast; 08-09-2014 at 08:17 AM. Reason: giving credit
timberbeast is offline   Reply With Quote
Old 08-10-2014, 12:14 AM   #27
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by DiapDealer View Post
The tags you can change stuff TO are configurable; but only through editing the JSON settings file manually, but otherwise you're right... the original tag you're looking for is not configurable.

I've had another request along the same lines (<a> tags). I'm thinking about adding the ability to remove/modify 'a' tags, but haven't put the time in yet. I figure it's not that critical since non-nestable tags (of which the anchor tag is one [99.9 percent of the time anyway]) are pretty trivial to regex away in well-formed (x)html.
Well, the only reason nested anything got in this discussion, I think, is because I happened to recycle some code for nuking nested tags. Your solution looks quite nice too.

I was kinda viewing your plugin as a way to remove extraneous elements, not nested per se. So it might be nice to have a plugin that does all the heavy liftingthinking for you.

EDIT: And I see you added <a> .

Last edited by eschwartz; 08-10-2014 at 12:19 AM.
eschwartz is offline   Reply With Quote
Old 08-10-2014, 01:04 AM   #28
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Another particular horror - sometimes seen in old mobo books - is nested blockquotes.
I've seen them about 6 layers deep in some free amazon books! & by the time you are at the innermost nest the text has almost been pushed off the screen!

those are a nightmare to remove & what makes it even trickier is that i usually want to keep the outer layer, and then just have a sensible blockquote margin set in CSS.

so if you guys are looking at tools for nested tags, the general challenge is for some code that locates nested tags & then removes all but the outer layer- is that possible ?
the same code would sometimes be helpful for simplifying spans

To be fair though, it's been a while since I saw one of those blockquote horrors. I think they were a way of overcoming mobi format limitations, and are unlikey to be nested so badly if foils work in epub or awz
cybmole is offline   Reply With Quote
Old 08-10-2014, 03:54 AM   #29
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,917
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
@cybmole

I use:
Code:
(?sm)(<blockquote class="\w">\s+){2,}(.+?)(</blockquote>\s+){2,}
and
Code:
\2
In 'Current File mode' : and click replace all 'N' times (until no more replaces)

It is not a perfect solution, you may have to fix (debug) some now-broken code

(IIRC Mobi has no 'margin-left, margin-right' support, thus the use of BQ)
theducks is offline   Reply With Quote
Old 08-10-2014, 05:22 AM   #30
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
thanks - now I'll have to remember where I might have saved a test case

does that code strip the nested tags from inner to outer, as usually the outermost one is the best candidate for keeping ?
cybmole is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What does the filepos parameter do in an href? lunixer ePub 6 03-16-2017 10:56 AM
Regex Solution to hidden href search? MizSuz Sigil 16 09-29-2012 07:40 PM
Why is a href needed in the manifest to validate? wannabee ePub 3 01-24-2012 11:40 PM
a href links working/not working mimosawind ePub 5 12-09-2011 12:42 PM
RFE: Remove remove tags in bulk edit magphil Calibre 0 08-11-2009 10:37 AM


All times are GMT -4. The time now is 12:00 AM.


MobileRead.com is a privately owned, operated and funded community.