Quote:
Originally Posted by cybmole
ok but
1. isn't that what google is for ?
2.when & if 200,000 books can be fully indexed, the metadata becomes pretty redundant, for research purposes?
We are talking upwards of 200Gb of compressed text here, so indexing that lot is not trivial. I don't think windows 7 is able to find occurences of a word or phrase within books that are in my much more modest calibre library - certainly not on default indexing settings
still curious as to how 200,000 fiction or non-fiction texts ( published ones with ISBNs) could be legally accumulated- by some one who is only just starting grad school
|
1. No, since Google doesn't even have 10% of books scanned & indexed that I have access to. And Google sucks even more when it comes to journals & papers. My university & my research lab has accumulated 200+ gigs of papers in PDF format and Google has only a tiny sliver of them indexed (I'm at a CS/AI lab at a large uni). Amazon's better when it comes to books but they don't have sci papers scanned/indexed. You can build your own mini-google-like search system.
2. Correct. I find metadata pretty much useless for scientific purposes. Full-text search is vastly superior and I rely on it daily. I'm a grad student and I've converted almost every researcher in my lab to my method of searching and using all of this data (it's wasn’t hard + it's nothing super-advanced). We're all AI people and it's natural to us to find ways to harness all this knowledge.
As for Win7, I don't know. I can't really help you there. My lab uses OS X and Linux for everything. Linux powers large number of servers (about 100+ and racks of Nvidia and ATI GPUs (we have probably around 200 GPUs) that we use for simulations. For everything else, it's Macs (mostly laptops). I can go in detail how we index all this stuff but it's probably not helpful to you since you're on Win.
As for your last inquiry, I'm not OP woodapple so I have no idea how he got 200k books. As for us, we got them, legally, from publishers, libraries & other researchers.