View Single Post
Old 01-09-2008, 02:03 PM   #119
megalomania
Junior Member
megalomania began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2008
Device: none
Thumbs up

Greetings MobileRead community, and to ereszet in particular. I found this forum and thread via one of my periodic searches of the net for anything related to scanning books with digital cameras. For over 4 years now I have been digitizing my books on and off, and every few months I like to see if anyone else is doing the same thing. It is only within the past 1.5 years that this kind of information has started to proliferate on the Internet.

A little over a year ago I built my own V cradle based on a picture of the Atiz BookDrive DIY. It does not look as nice as the one ereszet built, but the principle is identical. I have images of the cradle and building instructions in a thread at my website: http://roguesci.org/theforum/showthread.php?t=1232 I actually posted about my book cradle starting on page 5 of the thread.

The important innovation of my cradle is a horizontally sliding base that moves the book every turn of the page to maintain the exact same distance from camera lens to paper surface. In close-up (macro) photography there is a very significant difference in the area photographed by moving an inch closer or farther away. Imagine a 1 inch thick book, where page 1 is the bottom of a triangle, and the camera is the peak of the triangle. Now picture an imaginary line of the triangle extending to the bottom of the book, to the last page. If your camera is very close, this new triangle will be 1-2 inches wider on the last page than on the first page. If you do not readjust your camera after each picture is taken, every time you turn a page the resulting surface of paper is father away from the lens, and the resulting text is smaller. The distance increase is only equal to the thickness of the book’s paper, but with hundreds of sheets of paper it adds up to a big difference.

For all practical purposes you can manually adjust your camera every 50-100 pages or so, but if you try to compile an ebook to look nice, you will discover that batch cropping the pages is difficult because you have slightly more margins to crop with every page. If you rapidly scroll down such a PDF document you can see the text get smaller, and smaller, and smaller. If you try to OCR the pages the problem really gets bad. Page 1 is perfectly centered and cropped, and it might be 305 dpi, but page 100 has wasted margins because it was much farther away, so it is only 260 dpi. That is quite a significant difference in resolution.

My original cradle used a drawer slide to move from side to side, but the sliding action is too jerky, it does not allow for the fine fraction of a mm movement of each turned page. I bought a linear slide, which is a rather expensive device that moves from side to side with minimal friction. Linear slides are made for precision fractional mm movements. There are usually a few being sold on ebay for around $50, which is a good deal considering the typically cost $200-$300 new from the manufacturers. Once I finish wiring up my lighting, I will finish the copy stand and camera mount, complete with clear glass platen. The final device should be a close approximation of planetary scanners like the BookDrive, but not for $4500.

On the link to the thread I gave above I believe I posted about calculating dpi from a digital camera. I am writing an ebook about camera scanning, so I have a considerably more thorough treatise on the meanings of DPI, PPI, megapixels, etc., and how it relates to a final image. In brief, if your digital camera has a paltry resolution of 1600 x 1200, which equals 1.92 pixels (would be labeled a 2 MP camera), each of the 1600 pixels corresponds to DPI. If you photograph a 1 inch wide piece of paper, you would have a resolution of 1600 DPI. This might work for a newspaper article, but books are not 1 inches wide, they typically range from 6 x 8 to 8 x 11 inches including margins. Since cameras have a ratio of 4 x 3 (1600/4 * 3 = 1200, hence 1600 x 1200), but books have a ratio somewhat less, you only need to consider the width part of the camera (1600 in my example) to photograph books sideways from top to bottom. An approximately 6 x 8 inch book with half inch margins can be zoomed in and close cropped to 7 inches, which still leaves a little safety room in the margins. Our 7 inch wide picture only has a resolution of 1600/7 = 228.6 DPI. This is good enough to read, not so good for OCR. If you fail to adjust your camera every so often and the later pages are farther away from the lens, say we are now digitizing the full 8 inches of the book by page 400, our resolution would be 1600/8 = 200 DPI.

Figuring out what resolution camera you need to scan books is entirely dependant on how big the book is. Indeed the more megapixels you have the higher your resolution will be when photographing the same area. However, digital cameras do not technically correspond exactly to DPI like a scanner does. The most important factor of a digital camera is the lens. The lens makes or breaks the book. When I first started digitizing books I bought a 6 MP handheld point-and-shoot camera that had settings for close-up shooting. I scanned a few books for a local bookstore that deals with very rare 19th century local history and genealogical books. Demand for these books was high because people wanted to look up their family history, but there were few of these books ever published, and the condition of the books does not allow them to being manhandled. Flatbed scanning was out of the question because the books would be damaged by laying them flat and pressing down.

A bit more about old books, overhead scanning by a digital camera is the best solution for books in bad condition, or that are very valuable. It goes without saying you can not razor valuable books and scan them in a sheetfed scanner, and getting a bound book flat on a flatbed scanner requires a great deal of spine crushing pressure. Books with a damaged spine should never be opened more than 100 degrees. As I was designing my book cradle I found professionally manufactured book cradles for libraries invariably have 90-100 degree angles. At 90-100 degrees the pages on one side of the book will be absolutely flat. Open a book and see for yourself; if it’s a hardbound book or thick paperback, open the left side to page 20 and the right side will be flat until you open the book past 90 degrees.

I am digressing from my point about lenses. My little digital camera did a very good job of digitizing the pages, until I went to OCR the pages. Much to my dismay I discovered the corners of nearly all the pictures were slightly blurry. As it turns out this is called spherical aberration, and is a common defect of cheap lenses. Consumer model handheld cameras are designed for casual portraits at distances from 5 feet and greater; a zoom lens is not made for precise corner to corner focusing because it is a general use component. These lenses are a Jack of all trades, master of none. My camera, and any consumer camera, is unable to keep both the center area and corners focused at the same time.

My solution was to move the camera back farther to increase the amount of wasted margin space, but this also lowers the effective DPI. Then I bit the bullet and bought a professional DSLR camera with a detachable lens, specifically a Rebel XT with a macro lens. Macro lenses are designed for close up corner to corner focusing. To put the importance of the lens into perspective, my macro lens costs more than your entire handheld digital camera. Even a low quality lens for a SLR camera costs as much as cheap digital cameras. The lenses on any handheld camera are only realistically able to utilize a maximum of 10 megapixels. I recommend everyone read about the megapixel myth before buying one of those 12+ megapixel camera. Without a high quality lens, as with SLR cameras, imaging sensors beyond 10 megapixels are just useless marketing fluff that trick consumers into thinking “my camera is better than your camera.”

Commercial units like the Atiz BookDrive and their new BookSnap, and industrial models like the Kirtas book scanners all use quality cameras with quality lenses. The BookDrive uses Rebel XT and XTi cameras with macro lenses as a matter of fact. This does not mean your little digital camera is useless, far from it, just don’t expect perfect ebooks just as good as the publisher can make. My top priority in digitizing books is OCR; I digitize scientific reference works so they can be searched for the rare snippet. I rather dislike reading ebooks of any kind, on any reader, I would much rather have a paper copy. I realize this position may not be popular with the members of this website. If human readability is your top priority, and OCR accuracy is only secondary, than a handheld digital camera is more than adequate.

In my early experiments with digital cameras I found 3 megapixels to be the minimum resolution to digitize 6 x 8 inch books and still be able to read the text. Because the lenses are so bad on low resolution cameras, like camera phones, digitizing a smaller area can still be difficult to read. According to my calculations I will need a 70 megapixel camera to do 600 DPI scans, two pages at once, of 8 x 11 books. I won’t be holding my breath for one of those anytime soon, not an affordable camera anyway.

I can go on and on about methods to digitize books with a camera, but I will spare everyone further reading. I reserve the numerous fine details for the ebook I am writing on the subject. It will be free and not copyrighted, and I suppose I will include an online version on my website as well. Taking the pictures is actually the easy part, post processing the images is the challenge. Post processing has to be fast and automated, otherwise you lose the time saving advantage using a digital camera gives you.

Incidentally, I read in another thread here, https://www.mobileread.com/forums/showthread.php?t=14475 , that some people were questioning why using a digital camera is a superior method. Overhead scanning is absolutely the single fastest method of scanning a book. I hate scanning books, it is the most boring, tedious, mind numbing task ever devised. Being able to digitize books in less time was just the motivator I needed to actually start scanning. A digital camera scans a page in a fraction of a second, compared to the 45-60 seconds of a scanner, or 3-4 seconds of a copy machine. However, there is more to scanning a page than just acquiring the image. On either a copy machine or flatbed scanner you have to flip the book over, turn the page, flip it over again, align the book on the glass, and then scan the page. This adds another 20-30 seconds per page, maybe more if you are not so agile. With a digital camera I can scan a book almost as fast as I can turn the pages. It takes me an average of 5 seconds to lift my glass cover, turn the page, lower the glass, and press the shutter release on my remote control. I will put a movie on and get into the zone, just turn the page and click. Using the camera is such a passive activity that it can become an easy routine, and I like to believe I am not wasting my life watching TV if I am scanning books at the same time. The constant flipping and aligning of books on a flatbed scanner requires more attention to what you are doing, and this is more like work. On a copy machine I doubt you can have the luxury of TV or music to distract you.
megalomania is offline   Reply With Quote