MobileRead Forums - View Single Post

Tex2002ans · 10-23-2017, 02:52 PM

I recently came across this article referenced by The Digital Reader, "When Nothing Ever Goes Out of Print: Maintaining Backlist Ebooks" written by an editor from Houghton Mifflin Harcourt.

The entire article was a great read, but here is the part on Hyperlinks in backlist books + Link Rot:

Quote:

Hyperlinks

Many of our books have URLs in them. Particularly our adult nonfiction books, which often have endnotes with lots of URLs. And in ebooks, we make these URLs into hyperlinks.

And as we know, URLs can stop working.

The web community has gotten good about talking about this problem, and they call it link rot.

There’s scholarly research on the prevalence of link rot—when URLs stop working—and of reference rot—when the information at a given URL changes from what it was when the author cited it. This study found that more than half of the URLs cited in US Supreme Court decisions suffered from one or the other. Which is not a great thing for the history of American jurisprudence.

Other studies have looked at the half-life of a URL, suggesting that it might be about two years. (The concept of half-life here is just like that for radioactivity, the amount of time it takes for half of something to decay, or in this case, for half of the URLs in a set to stop working.)

What does this look like in an ebook? Powers of Two is a pretty typical nonfiction book from our backlist. It has 275 URLs in it, mostly in the endnotes. It was published in August 2014.

So if we use that half-life of two years model, this summer, in August 2016, we would expect that only 50% of those URLs would still be good.

By August 2018, only 25% of the URLs would still be good.

By August 2024, ten years after pub, this model would predict that only about 3% of the URLs would still be working. And unfortunately, I don’t plan to be retired yet by then. So this is a real problem for us.

When I actually tested the URLs in Powers of Two with the W3C Link Checker, I discovered that about 47 (or 17%) of the URLs were not working.

On the bright side, 83% percent were still working. So we’re running a bit ahead of the curve there. (Or perhaps the websites our authors choose to cite are more reliable than the average website.)

As usual, there’s a “do it right from the beginning” solution, which certain parts of the web and scholarly communities are embracing (using things like DOI to identify electronic documents or a service like Perma.cc that archives cited online content). I haven’t seen trade publishing take this problem on.

But because we’re talking about backlist, and we don’t have the advantage of going back in time to do ask our authors to archive their links correctly, we’re talking about inheriting tens of thousands of old URLs that were not deliberately preserved in any way. (Not all our books have 275 URLs, and some have none, so what if we say 20 on average? Hypothetical 5000 book backlist at 20 URLs per book = 100,000 slowly decaying URLs.)

And this is a problem for print books as well as for the ebooks of course, but I think we’re more content to let the URLs in print books function essentially as decoration—as signs that there is scholarship underlying their claims. And we also assume that a very motivated reader who types out an entire URL from a print book to try to get to a source document will also have some basic web literacy around using Google to search for an alternative.

The URLs in ebooks, however, we transform into hyperlinks, which imply that the information is just a click away, and when it isn’t, it doesn’t just seem like the link is broken. It seems like your ebook is broken.

The hyperlinks in ebooks can also be checked in automated fashion, and then we get this:

https://cdn-images-1.medium.com/max/...TG45dAIy0g.png

So this is the scale at which we’re dealing with this problem right now—our backlist has a hundred thousand slowly decaying URLs, little time bombs embedded in our backlist—and the retailer response is to send me an individual email every time they notice one. Fortunately, they don’t seem to be looking very hard right now.

And then, the interesting part: what do we do to fix this? There are a few options, but they all have some authorial implications:

If the site’s just been reorganized, find the new correct URL.
If the URL is just meant to point to general further information, find a new site that contains that information.
Remove the URL altogether. We’ll do this if a citation given is complete enough that the reader can probably find the print version of the material cited, even though the online version seems to be gone.
Leave the URL, but remove the hyperlink. This looks like a mistake in the ebook to me, but sometimes you just don’t have any good options. This seems to satisfy retailer complaints (or at least evade their automated link checking).

Quote:

Originally Posted by Hitch

Honestly, NJ, I kind of disagree that Location #'s are worthless. Why do you say that? At least they're consistent, unlike "page numbers" as created by ePUB devices, each to their own calculation.

*Casually cites one of my ginormous posts in the "Sick of Amazon Kindle books with Page Numbers" thread, which categorized the Pros/Cons of all the different location methods*