MobileRead Forums - View Single Post - Sony Partners With Google To Bring More Than 500,000 Books To The Reader

Elsi · 03-19-2009, 12:38 AM

Quote:

Originally Posted by RWood

Since I was on a roll I tried to load the file to my Reader. I put the file (along with some regular files for the Reader) on a Memory Stick, put the Memory Stick in the Reader, and the file did not show up in the index.

Did you put the ePub file on the Memory Stick? or an .LRF file you created from the ePub?

By the way, I also downloaded a book -- one I hadn't heard of before, The Facetious Nights of Straparola. Before the title page, I found text explaining the terms and conditions, including this:

Quote:

Google Book Search has digitized millions of physical books and made them
available online at http://books.google.com

The digitization at the most basic level is based on page images of the physical books. To make this book available as an ePub formated file we have
taken those page images and extracted the text using Optical Character Recognition (or OCR for short) technology. The extraction of text from page images is a difficult engineering task. Smudges on the physical books' pages, fancy fonts, old fonts, torn pages, etc. can all lead to errors in the extracted text. Imperfect OCR is only the first challenge in the ultimate goal of moving from collections of page images to extracted-text based books. Our computer algorithms also have to automatically determine the structure of the book (what are the headers and footers, where images are placed, whether text is verse or prose, and so forth)

Getting this right allows us to render the book in a way that follows the format of the original book.

Despite our best efforts you may see spelling mistakes, garbage characters, extraneous images, or missing pages in this book. Based on our estimates, these errors should not prevent you from enjoying the content of the book. The technical challenges of automatically constructing a perfect book are daunting, but we continue to make enhancements to our OCR and book structure extraction technologies.

For the most part, the OCR and presentation is quite good. But when it fails, it fails spectacularly!