View Single Post
Old 03-19-2009, 12:38 AM   #6
Elsi
Wizard
Elsi is a glorious beacon of lightElsi is a glorious beacon of lightElsi is a glorious beacon of lightElsi is a glorious beacon of lightElsi is a glorious beacon of lightElsi is a glorious beacon of lightElsi is a glorious beacon of lightElsi is a glorious beacon of lightElsi is a glorious beacon of lightElsi is a glorious beacon of lightElsi is a glorious beacon of light
 
Elsi's Avatar
 
Posts: 2,366
Karma: 12000
Join Date: Jan 2008
Location: Texas, USA
Device: Kindle; Sony PRS 505; Blackberry 8700C
Quote:
Originally Posted by RWood View Post
Since I was on a roll I tried to load the file to my Reader. I put the file (along with some regular files for the Reader) on a Memory Stick, put the Memory Stick in the Reader, and the file did not show up in the index.
Did you put the ePub file on the Memory Stick? or an .LRF file you created from the ePub?

By the way, I also downloaded a book -- one I hadn't heard of before, The Facetious Nights of Straparola. Before the title page, I found text explaining the terms and conditions, including this:
Quote:
Google Book Search has digitized millions of physical books and made them
available online at http://books.google.com

The digitization at the most basic level is based on page images of the physical books. To make this book available as an ePub formated file we have
taken those page images and extracted the text using Optical Character Recognition (or OCR for short) technology. The extraction of text from page images is a difficult engineering task. Smudges on the physical books' pages, fancy fonts, old fonts, torn pages, etc. can all lead to errors in the extracted text. Imperfect OCR is only the first challenge in the ultimate goal of moving from collections of page images to extracted-text based books. Our computer algorithms also have to automatically determine the structure of the book (what are the headers and footers, where images are placed, whether text is verse or prose, and so forth)

Getting this right allows us to render the book in a way that follows the format of the original book.

Despite our best efforts you may see spelling mistakes, garbage characters, extraneous images, or missing pages in this book. Based on our estimates, these errors should not prevent you from enjoying the content of the book. The technical challenges of automatically constructing a perfect book are daunting, but we continue to make enhancements to our OCR and book structure extraction technologies.
For the most part, the OCR and presentation is quite good. But when it fails, it fails spectacularly!
Attached Thumbnails
Click image for larger version

Name:	Image1.gif
Views:	1192
Size:	43.5 KB
ID:	25939  

Last edited by Elsi; 03-19-2009 at 12:50 AM.
Elsi is offline   Reply With Quote