View Single Post
Old 11-05-2011, 06:05 AM   #1
howyoudoin
how YOU doin?
howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.howyoudoin ought to be getting tired of karma fortunes by now.
 
howyoudoin's Avatar
 
Posts: 1,100
Karma: 7371047
Join Date: Feb 2009
Location: India
Device: Kindle Keyboard, iPad Pro 10.5”, Kobo Aura H2O, Kobo Libra 2
Lightbulb reCAPTCHA - Did You Know?

How many times have you been waiting to download a file, and come across a 'toll booth' in the form of a reCAPTCHA window?



What you probably didn't realise, as you keyed in the letters with one hand, was that you were actually a tiny component in the grand design to OCR old Texts for archival purposes.

Every now and then, automated OCR scripts come across words on pages that are just indecipherable for various reasons. These words end up in those tiny reCAPTCHA windows, waiting for the ultimate OCR machine (you!) to transcribe them! The reason file hosting sites, or sign-up pages are confident that spambots can't get through that barrier is because these are words that have stumped automated text recognition programs to begin with!


From wikipedia:

reCAPTCHA is a system originally developed at Carnegie Mellon University that uses CAPTCHA to help digitize the text of books while protecting websites from bots attempting to access restricted areas.[1] On September 16, 2009, Google acquired reCAPTCHA.[2] reCAPTCHA is currently digitizing the archives of The New York Times and books from Google Books.[3] Twenty years of The New York Times have been digitized and the project planned to have completed the remaining years by the end of 2010.[4]

reCAPTCHA supplies subscribing websites with images of words that optical character recognition (OCR) software has been unable to read. The subscribing websites (whose purposes are generally unrelated to the book digitization project) present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects.

The system is reported to display over 200 million CAPTCHAs every day,[5] and among its subscribers are such popular sites as Facebook, TicketMaster, Twitter, 4chan, CNN.com, and StumbleUpon.[6] Craigslist began using reCAPTCHA in June 2008.[7] The U.S. National Telecommunications and Information Administration also used reCAPTCHA for its digital TV converter box coupon program website as part of the US DTV transition.[8]



NOW YOU KNOW!

Last edited by howyoudoin; 11-12-2011 at 12:53 PM.
howyoudoin is offline   Reply With Quote