03-22-2007, 08:31 PM | #1 | ||
Fully Converged
Posts: 18,170
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Google scanning 27'000 books per day - at least, says Economist
The Economist runs an interesting story according to which Google is scanning the staggering number of 27'000 books on average per day:
Quote:
Quote:
|
||
03-22-2007, 10:19 PM | #2 |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
"What is a book, anyway?"
Does anybody really know what a book is? Does anybody really care? If so, I can't imagine why... we all have read enough to cry. Anyway... Maybe if everyone didn't think about replacing print books, and thought about augmenting Literature instead, we wouldn't be debating the emergence of e-books at all. |
Advert | |
|
03-23-2007, 01:36 AM | #3 |
Cache Ninja!
Posts: 643
Karma: 1002300
Join Date: Jan 2007
Location: Tokyo, Japan
Device: PRS-500, HTC Shift, iPod Touch, iPaq 4150, TC1100, Panasonic WordsGear
|
That's a staggering number a day to be digitizing, wonder how much their cost expenditures are (esp. considering when they get into scanning really old books that have to be delicately handled)? Now let's just hope that the "books unbound" remain that way and will be updated to new formats as they evolve; i.e., don't get stuck in a format that's unusable/inaccessible in the future.
There are quite a few issues I can think of off the top of my head that will keep books in print long after new formats emerge/evolve, what would be nice is if they printed a set number of "master" copies of a given book on some highly rugged or nigh-indestructable medium that would ensure it will be around for generations to come. I mention this because I'm a mild book collector and I've quite a few in my library that it wouldn't be feasible to leave out for the young 'uns to play with. It would be a shame to see a work disappear due to aging... and back to the thread, Google will at least keep most of this accessible to people with online access and the right credentials. |
03-23-2007, 07:58 AM | #4 |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
I'm guessing that, because they have college libraries (and therefore probably students) doing the work, the cost to Google must be minimal. Still, if one library is digitizing 3,000 books a day, even with the latest and fastest scanning equipment (the slowest part of the process), that has to be multiple workstations and quite a number of students working on that project!
The article suggests they are simply making images of each page ("fingers are visible in the corners of many pages on books.google.com"), and that's a shame. If they're not being text-reco'd, they're missing a great opportunity. In re-reading the article, I realize again they should have given that story to someone who actually knows something about e-books. In comparing e-books to CDs, the author says "The simplest difference is that transferring one's old music CDs onto iPods is easy, whereas transferring one's old books onto an e-book is impossible." Really? Last edited by Steven Lyle Jordan; 03-23-2007 at 08:07 AM. |
03-23-2007, 09:04 AM | #5 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
Storing images of pages does not prevent them from being OCRed later, while storing just text loses quite a bit from the complete presentation, not speaking about possible OCR errors which can be hard to correct without checking the originals.
Anyway, if you do a search on books.google.com, you will see the search result highlighted on the image of the page. So I guess they store both the image and the text of it. |
Advert | |
|
03-23-2007, 09:06 AM | #6 | |
Gizmologist
Posts: 11,615
Karma: 929550
Join Date: Jan 2006
Location: Republic of Texas Embassy at Jackson, TN
Device: Pocketbook Touch HD3
|
Quote:
|
|
03-23-2007, 09:53 AM | #7 |
Old Yeller
Posts: 180
Karma: 67
Join Date: May 2006
Device: Iliad & Kindle - The Best of Both Worlds
|
...and how many of those books are actually scanned WELL?
I seem to run across quite a lot that have pages missing or so skewed that text is missing; some contain colored pages instead of black and white text; some files won't open or display correctly ... I mean, this is almost fully automated scanning we're talking about, right? Somebody jamming a bunch of pages into a sheet-fed scanner and uploading the result within minutes? I've all but given up on Google books... |
03-25-2007, 08:43 AM | #8 |
eink fanatic
Posts: 2,022
Karma: 4924
Join Date: Mar 2006
Location: Germany
Device: STAReBOOK, iRex Iliad, Sony 505, Kindle 2
|
I hope they really get serious about this project. Scanning all the books, correcting errors and then OCR and proofreading are a lot of work and only a really big company like google could even hope to manage that...
|
03-26-2007, 10:15 PM | #9 |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
Actually, I never thought Google was right for this job. Given the apparent inconsistencies of the results, there should be a dedicated organization doing this... a group that will put more effort into properly scanned, reco'd and saved texts.
Don't ask me what organization. It should be the Library of Congress, but I don't think we can expect them to do it. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Scanning in your own books | gazza | News | 125 | 01-24-2016 04:42 PM |
A Day Spent Scanning | Gideon | Workshop | 3 | 06-15-2009 07:54 PM |
The Economist: E-books are becoming popular | hidari | News | 6 | 02-15-2009 11:36 AM |
Win for Google book-scanning project in Germany | Alexander Turcic | News | 0 | 07-01-2006 08:54 AM |
Books of the year 2004 (Economist) | Alexander Turcic | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 12-06-2004 08:18 AM |