View Single Post
Old 09-05-2007, 08:45 AM   #9
vivaldirules
When's Doughnut Day?
vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.
 
vivaldirules's Avatar
 
Posts: 10,059
Karma: 13675475
Join Date: Jul 2007
Location: Houston, TX, US
Device: Sony PRS-505, iPad
Quote:
Originally Posted by DMcCunney View Post
I assume you are aware of Project Gutenberg, which has been doing precisely that, and may have some of what you want done already?

If not, go here: http://www.gutenberg.org/wiki/Main_Page

You also might wish to sign up for Distributed Proofreaders, which is the main feed of clean copy to be included in the Gutenberg archives.

Go here: http://www.pgdp.net/c/

As for scanning, it's labor intensive. What scanner do you have?

Welcome to MobileRead!
______
Dennis
Thanks. Yes, I'm aware of these. I have a small collection of 19th century science books that are not available from these or other sites. I'd like to try to make them so available. So I thought I'd scan them in. My biggest problem is the time involved. My HP 5400c and HP OCR software takes a minimum of about two minutes per page of pure text. I tested it on a chapter of clean, large type text and was thoroughly disappointed. Dealing with figures, tables, equations, etc. and correcting the OCR and I'm looking at too many long days to do a good job on, say, William Thomas Brande's "A Manual of Chemistry", 1828, which is 640 pages with lots of nontext. Somehow it just seems like it should go much more quickly than that and I'm wondering if there are hardware and software tools to do just that. Any good ideas would be appreciated.

Last edited by vivaldirules; 09-05-2007 at 01:59 PM.
vivaldirules is offline   Reply With Quote