![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
![]()
has there been any discussion of / any requests for a tool to remove all <a href= type links from within a book ( excludiing the ones which have to be there in xhtml headers.
I quick "search this forum" did not find anything. I find these a pain to remove, as they are often entangled within spaghetti like chapter headers code, where they link back to a HTML TOC page; or they are arbitrarily added to place names/ addresses etc in a story. In both cases , seeing blue underlined stuff is intrusive, and as I usually remove any HTML toc page from my personal reading copies, , I end up with broken links which upset my e-reader software if I tap the accidentally. so iit would be great for me if there was an editor function or a calibre conversion option that just zapped them all away Last edited by cybmole; 08-08-2014 at 02:25 PM. |
![]() |
![]() |
![]() |
#2 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
You could do it with a regex, but I think DiapDealer's sample editor plugin might do a better job. Will require manually configuring the config file. Nope, doesn't seem to be configurable. He might add in link tag support, but in the meantime...
Hmmm. From a suggestion of mine in the Modify EPUB expansion discussion: https://www.mobileread.com/forums/sho...83#post2801083 Search: Code:
<a href="[^<>]*">((?:(?!<(?:a|/a)).)*)</a> Code:
\1 ![]() Should handle nested tags. (And a byproduct is that if, for some godawful reason there are nested link tags which should NEVER happen, it'd still work. I could probably do this the short way, then, with lazy searching, but I like this masterpiece, plus I like copy-pasting previous solutions. ![]() Last edited by eschwartz; 08-08-2014 at 02:40 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,249
Karma: 145488788
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Before you remove all <a href= code, take a look to make sure that you are not removing any that matter.
A lot of them are used as filler to show where the page number is in a paper book. And then there are the ones that link back to the HTML ToC. Those can go too. And they look awful. |
![]() |
![]() |
![]() |
#4 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
![]() @cybmole, if it's in the header it would be a <link> to attach resources, not an <a> to create a clickable hyperlink. |
|
![]() |
![]() |
![]() |
#5 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,249
Karma: 145488788
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
But, it can be rather annoying to put in all the work to modify the eBook and then find out you goofed and have to start over.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
i know from bitter experience that i must not zap these
Code:
<link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" /> <link href="../Styles/page_styles.css" rel="stylesheet" type="text/css" /> where I have gone previously wrong is to tackle the header cleanup in chunks, so I've tried to zap the href bit only, with a view to then losing the <a> tags but preserving what is inside of them. But that takes out the vital links code. Getting the <a href... and the closing </a> tag out of retail code like this example ( that I quoted in sigil forum) is no fun! Code:
<h2 class="chp"><a href="../Text/wizardandglass_con01.html#TOCC-6"><span class="chapnum"><b>CHAPTER I</b></span><br /> B<span class="largecap">ENEATH THE</span> D<span class="largecap">EMON</span> M<span class="largecap">OON</span> (I)</a></h2> my 2nd is to go to my backup copy of calibre library, restore the backup of that book into my main library, & start over I needed both cards, more than once, when tweaking the above code! ->eschwartz: I'll give your code a try-out on the next nasty I come across- thanks Last edited by cybmole; 08-08-2014 at 04:33 PM. |
![]() |
![]() |
![]() |
#7 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Did you give my code a try? It should nuke <a href="link-location">content</a> pairs -- all of them -- while preserving any markup on the "content". If you need to fine-tune it any more, the important bit is the bit in multiple layers of parentheses. It uses the power of negative lookbehinds to match-any-string but ones that include the excluded stuff in red.
Nuke tag sets while preserving nested instances and other markup: Code:
<tag-to-nuke(?: optional-attribute(s)="[^<>]*")?>((?:(?!<(?:tag-to-nuke|/tag-to-nuke)).)*)</tag-to-nuke>
Code:
\1 Last edited by eschwartz; 08-08-2014 at 04:31 PM. |
![]() |
![]() |
![]() |
#8 | ||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,393
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
I've had another request along the same lines (<a> tags). I'm thinking about adding the ability to remove/modify 'a' tags, but haven't put the time in yet. I figure it's not that critical since non-nestable tags (of which the anchor tag is one [99.9 percent of the time anyway]) are pretty trivial to regex away in well-formed (x)html. ![]() Quote:
Search for: Code:
</?a\M([^>]+)?> That should remove opening/closing 'a' tags leaving the text between them alone (as well as removing self-closing <a id="blah" /> entries). Or something like this (in calibre OR Sigil): Code:
</?a ?([^>]+)?> Last edited by DiapDealer; 08-08-2014 at 06:46 PM. |
||
![]() |
![]() |
![]() |
#9 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,393
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I realize my approach might seem a bit "nuclear" (removing all opening/closing/self-closing anchor tags in a document), but if you step through one at a time and make sure you don't delete an open tag but skip the close-tag (or vice-versa), it's not so bad. And it certainly leaves "link" stuff in the header alone.
Besides, I think we can all agree that the mass removal of <a> tags is fraught with peril to begin with. Even if you only focus on the seemingly innocuous ones with href attributes, that doesn't mean the "id"s of those anchors aren't the targets of nav elements in the ncx file, or spine/guide elements in the opf. In fact, that's especially likely in the case of some of those chapter-header monstrosity structures you pointed out. So with all that in mind ... it seemed like you were in a bit of the "Damn the torpedoes! ... I've got checkpoints set" frame of mind anyway. ![]() Last edited by DiapDealer; 08-08-2014 at 07:46 PM. |
![]() |
![]() |
![]() |
#10 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,219
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you dont like blue underlined stuff, simply add a couple of CSS rules to make links not show up as blue and underlined, instead of removing them
|
![]() |
![]() |
![]() |
#11 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
|
|
![]() |
![]() |
![]() |
#12 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
for your viewer, yes but for ade based readers- They decide that they know best & that if its a link its going to render as blue+ underlined, no matter what you put in the CSS |
|
![]() |
![]() |
![]() |
#13 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
the basic concept of a story is you begin at the beginning and read through to the end! - you don't want to be thrown off course by tapping some blue bit by accident or out of curiosity- especially if it's going to throw up some crap about needing to turn your wi-fi back on ! & I've not looked deeply into footnote coding techniques, but does best practice for those need the a href constructs ? |
|
![]() |
![]() |
![]() |
#14 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
your code did not fix the example below because there's a class after the <a That's always going to be the case always if the book has gone through a calibre epub to epub conversion? , because calibre will add classes to every tag. The previous example I gave was from a completely unedited/ unconverted retail book, but my usual workflow for making a personal reading version is to load original into calibre & immediately convert it epub-to-epub , then tweak only within the resulting copy, never touch the original_epub backup. I use the convert to add extra CSS so as to zap hyphenation & zap widows & orphans at the same time. Code:
<h1 class="calibre10" id="rw-h1_319849-00001"><a class="calibre7" href="../Text/9780857900135_toc.html">4</a></h1> <h1 class="calibre10">4</h1> the ID tag is redundant i.e. does not impact the reading experience in any way ? this find worked ok though: Code:
<a class="calibre\d" href="[^<>]*">((?:(?!<(?:a|/a)).)*)</a> Last edited by cybmole; 08-09-2014 at 01:20 AM. |
|
![]() |
![]() |
![]() |
#15 | ||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,393
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
As for why you want to remove them (or why you don't think there's a 'normal' reason to want one in a typical novel), I don't really care. The fact is: nav entries in the ncx file quite often point to those anchors--as do the spine/guide elements of the opf. That makes blindly removing them quite risky, in my opinion. If you're 100% certain the nav entries and the spine/guide elements of your ebook's ncx/opf all point to HTML files directly (no URL fragments representing the ids of those 'a' tags), then of course the peril I spoke of is less. It has nothing to do with being a "basic storytelling novel", and everything to do with "that's just how some ebooks (even commercial ones) are constructed sometimes." Quote:
Last edited by DiapDealer; 08-09-2014 at 01:49 AM. |
||
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
What does the filepos parameter do in an href? | lunixer | ePub | 6 | 03-16-2017 10:56 AM |
Regex Solution to hidden href search? | MizSuz | Sigil | 16 | 09-29-2012 07:40 PM |
Why is a href needed in the manifest to validate? | wannabee | ePub | 3 | 01-24-2012 11:40 PM |
a href links working/not working | mimosawind | ePub | 5 | 12-09-2011 12:42 PM |
RFE: Remove remove tags in bulk edit | magphil | Calibre | 0 | 08-11-2009 10:37 AM |