I'm not sure that you can actually take the size of the book collections at face to determine the value to the downloader. Maybe you want one book, and you know the torrent contains that one book (or more likely *hope* that it contain it). You still have to download the whole thing. The rest of the books are probably valueless to you, in fact they get in the way of finding that book.
Why keep multiple collections then? Reasons that come to mind include:
(a) There might be an unlisted book there you are looking for (more likely there is no book list at all).
(b) You might come across a book in the future that you need, that might be in one of the collections. No way to predict it now, the collections are volatile, better cache a private copy.
(c) You don't know what you're getting. 90% of the scanned books out there require significant manual work to get into a usable format for ebook readers (examples: PDF files with tiny text, headers and footers; wildly varying font sizes and formats (chapter headings in 50 point and text in 8 point), clipping because of margins; hard returns at the end of lines but no indents or blank lines between paragraphs, etc). At least 20% are completely unusable (bad scans, PDF files with clipping, unconvertable formats, etc). So to get one good book to convert you want to have multiple independent copies of the source in the hopes that one will be easily convertable. Well-formatted commercial ebooks combined with readers where text fonts and formats can be changed have value when available and not DRM'd to uselessness.
(d) It's a lot of work sorting through collections. Sometimes the collection compiler has sorted the books into catagory and author, but the file naming format varies (ISBN number, author--title, title only, and so on). So finding that one book could take hours. It gets ridiculous when you are trying to match up 100 pbooks you own to the flood of junk.
An anology (without any attempt to reflect on the legal side of things, just motivational) would be being given management of a really low-end used book shop that gets its stock from recycling bins etc. To one reader (you), 99.999% of the stock is junk. The books are soiled, musty, and torn, and most of them are "remaindered". (At one time publishers shipped books to newstands on consignment, and expected either payment if the book sold or the front cover sent back if it didn't, and some newstands would tear off the cover and sell the remainder to used book sellers. This was of course illegal, thus the notice on the copyright page of older paperbacks "If you purchased this book without a cover, be aware that it is stolen material".)
So, having inherited this bookstore, do you call a junkman to clear it out, take a month off work to sort through the piles for what you want then clear it out, or let it continue operation (adding more books) in the hopes of finding desirable books when you browse the shelves every few weeks)? If there's no financial loss (and no real advantage to clearing it out), and some realistic chance of finding something valuable, most people will choose the latter.
If there was a large open ebook archive where books were well indexed, very inexpensive, open source, and with some guarantee that it would be there for the forseeable future then "hoarding" would be irrational. It'd be like downloading the entire Baen Free Library or the Project Gutenburg collection. (Of course you might if you knew these were about to vanish). Motivationally (not leglly of course) "hoarding" is more like collecting all the ebooks Tor was giving away. Maybe you might not read them now, but your tastes might change later, and these books *will* (and probably have) disappeared.
At present, the motivation for keeping a large collection is like Google's Web page caching--there's good reason for it because of source volatility, but to any individual most of it is a waste of space. I expect the motivation to cache increases with the volatility of the source material and decreases with the perceived cost of caching the material (acquiring it and storing it). How valuable the material (potentially) is to the cacher is a huge factor too. This changes with time so old cached material gets discarded eventually.
(Interestingly, many Web pages have a copyright notice--is Google pirating the pages it caches? I haven't heard of anything so far, so there must be some legal loophole for them). I'm not saying you're entitled to the material you've cached, just that the value of that material (books actually of interest or that are ultimately useable) is a tiny fraction of face value (number of files).
If the bookshop analogy is too vauge (or maybe you haven't been to some of the store I patronized 25 years ago to know how low you can go), here's a smaller-scale analogy. Downloading book collections is like collecting out-of-print titles by buying big boxes of remaindered, used books at an auction, without being able to open the boxes first. (I suspect this was how some of these bookshops actually got their stock). If the boxes cost a dollar each, you might buy several boxes. Maybe you don't get around to sorting through them for a few years and so have a bunch of boxes in the basement. That isn't a book collection, it's a junk collection until you sort through the boxes or discard them unopened.
You'd really prefer borrowing from a friend with similar tastes (limiting but you know the books are all good) or visiting the library (more trouble and more work searching for books that match your tastes). This is why purchased ebooks (especially DRM-free ones) are more worthwhile (by a factor of 100 or more) than book collections as measured by the ratio ("books read" / "books on hard disk or shelf").
When I shop at Amazon for books using their reviews, book lists, links etc.) I end up actually reading and enjoying about 70% of what I buy. I doubt that any 3000+ book downloaded collection would have even 5 books looked for, let alone read. But that's 10 books that didn't have an e-copy (to match my p-copy
) before, so there is still some value. It would be worth real money to get those 5 books in a form that was immediately usable or properly indexed, and portable to future reading devices.