One of the commenters to the article had an interesting anecdote:
Quote:
My first job out of college was working IT at Questia, which at the time was in start-up mode. The company was building a digital research library with a launch goal of having 50-60k digitized books and another 100-200k digitized magazines and scholarly journals. The books would be scanned and OCR'd and XML tagged, with the pagination and images preserved, and would be full-text searchable.
The thing is, Questia had about 300 people JUST doing copyright research. There were a large number of public domain books that they included, but they employed a set of professional librarians to do the book curation, and the vast majority of the books to be included were under copyright. Those 300 copyright researchers worked 10-12 hours a day tracking down who held the copyright for each individual book and then attempting to negotiate with the copyright holder. I believe when we actually hit launch, there were about 30k books digitized and ready to go.
Let me say that again: 300 people working 10+ hours a day, for almost two years, managed to only secure the rights to ~30,000 books.
When Google announced their book scanning project, the first thing I thought was that they were entering into a world of pain with the copyright negotiations--every publisher wants its own set of terms and few want the same things. I remembered those poor researchers at Questia, and wondered how Google was going to do it all.
Turns out Google went with the path of least resistance: "Fuck it, we're Google, just start scanning." Blows my mind.
|