Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : Scanning paper (out of copyright) books.


Charles Gray
06-14-2006, 09:58 PM
I have many, MANY books-- some of them are out of copyright, and for others I was able to get permission to ebook them so long as it isn't distriubted and the original copy is destroyed.
But that leaves the question of how do I do it? Flatbed scanners seem desructive and although I have a very good OCR program (Abby fine reader), the "lift" in the spine seems to cause problems. That's not a problme for the "Scan and destroy" books, but my out of copyright pulps from the 1920's are a different matter. (and rather important, as I'd like to read them, but too much reading will also destroy them). I didn't see any other place here to ask this question, so I was wondering if I could recieve any help.

ath
06-16-2006, 01:38 AM
But that leaves the question of how do I do it? Flatbed scanners seem desructive and although I have a very good OCR program (Abby fine reader), the "lift" in the spine seems to cause problems.

Unless you have access to an overhead scanner, scanning is very probably going to be destructive to some extent.

Scanning books quickly means, unfortunately, cutting them up, and running them through a page-fed scanner.

You can scan page spreads with a flat-bed scanner, but it will stress the spine and the hinges of the book in a way that doesn't happen with ordinary reading. I've done several late 19th century books on a largish flatbed, and if the books don't break up entirely, the back cover is usually ripped afterwards, and some of the sections are starting. There is also some risk of ripping or folding a page due to clumsy handling.

There are scanners where the scanning area extends to the edge of the device (see Plustek OpticBook 3600 (http://www.plustek.com/products/book.htm), or the 3600 Plus if you're going for PDF -- and I think Xerox has/had a similar scanner). This lessens the stress on the spine, but it doubles the effort and time, as well as doubles the risk of damaging the page.

I know of some experiments with a camera (a digital camera is a kind of overhead scanner, and with a film camera you can often get decent scans made from the film), but it definitely requires more than just point-and-click. You will at least need some kind of good camera stand, as well as good, even lighting. See project Runeberg (http://runeberg.org/admin/camera.html) for more info.

tribble
06-16-2006, 01:46 AM
What about taking photos in highres of the pages, like the professional bookscanners do. Then do a batch transform of your image to change the pages, that the distortion gets removed. then do the OCR.

DTM
06-22-2006, 07:16 AM
I'm sure you could get some help in the forum at the Distributed Proofreaders website. You may even want to run your projects through them, getting you an entire network of proofreaders.

Check it out at: www.pgdp.net

ereszet
09-28-2007, 07:34 AM
See my thread "do-it yourself repro v-cradle for paper books" in Reader Accessories

RWood
09-28-2007, 08:44 AM
There was a thread by Bob Russell about a scanner that was designed for bound books and had them over the corner of the scanner so a page would lie flat. It seemed to work well. I will look again for the article.

ricdiogo
09-28-2007, 11:45 AM
I'm sure you could get some help in the forum at the Distributed Proofreaders website. You may even want to run your projects through them, getting you an entire network of proofreaders.

Check it out at: www.pgdp.net

Charles Gray, DTM has given you a great advise. You would also be contributing for having more public domain ebooks freely available online at Project Gutenberg.

I also suggest you to read Project Gutenberg's Scanning FAQ (http://www.gutenberg.org/wiki/Gutenberg:Scanning FAQ).

Studio717
10-19-2007, 05:30 PM
There was a thread by Bob Russell about a scanner that was designed for bound books and had them over the corner of the scanner so a page would lie flat. It seemed to work well. I will look again for the article.

This is the Opticbook 3600. I have one and it does a great job with scanning. The edge of the glass is almost at the very edge of the scanner, so except for too-tightly bound (or usually, ime, rebound) books, it does a beautiful job of capturing all the text.

Any flatbed scanner is going to take longer to scan than an overhead setup like ereszet's (which is a setup I'm trying to recreate myself for a large book I have), but the Opticbook is the best out there as far as I've found for a low-cost flatbed solution.