Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > Deals and Resources (No Self-Promotion or Affiliate Links)

Notices

Reply
 
Thread Tools Search this Thread
Old 02-02-2009, 04:55 PM   #1
latchkeyed
Member
latchkeyed began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2008
Device: Cybook Gen3
Post bkrpr: book ripper/non-destructive scanner

If you have more paper books than ebooks for your new ebook reader, and want to find the easiest way to get text from the one to the other, a friend and I have started bkrpr.org as a place for people to share experiences and tools for converting their books into a digital form.

Currently we have a design, and video of the design actually in use, for a non-destructive camera mount/book stand that allows you to photograph your books quickly and pretty effectively. I'm getting something between 600-1000 pages an hour with it depending on the actual book used. Resulting images and OCR (using the free tesseract program) are available here: http://bkrpr.org/doku.php?id=results. The whole device is built from ~$65 USD in parts with the addition of two ~$100 USD consumer cameras. If you already have a camera or two they should work just fine, though some features will make things faster.

We are also building free software for picture processing, so you don't have to rotate and crop all the pictures manually. Version 0.1 is out today and can do rotate and crop pretty well. If you want to see some examples of the before and after images, we've got a flickr stream with real samples here: http://www.flickr.com/photos/bkrpr/s...7613057232393/. We will also be adding some functionality for OCR and PDF generation as we develop further.

Currently the software is linux only, but it is all done in python and others are welcome to come along and help move this to all operating systems. Windows and OS X compatibility is the main target for the 0.2 release and the device for taking the pictures doesn't rely on your computer at all, so you can use it to digitize regardless of what operating system you use.

We've got some other device designs in the works, and would love to hear what people are doing with their own books. Anyone who is interested should come over and take a look. Comment here or there, wherever works for you.
latchkeyed is offline   Reply With Quote
Old 02-02-2009, 05:10 PM   #2
chorpler
Zealot
chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.
 
Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
That is the coolest thing ever. But I wonder -- is it any faster than scanning a book with a flatbed scanner? That's what I've been doing when converting my books to digital format (for personal use only, of course!) for the last 12 years. It takes about a half-hour per 100 pages of book, I would say, for black and white 300DPI scanning. Greyscale ups the time to about 1.5 hours per 100 pages. Naturally, color considerations are an area where a setup like yours, with digital cameras, would be hugely advantageous.
chorpler is offline   Reply With Quote
Old 02-02-2009, 05:25 PM   #3
latchkeyed
Member
latchkeyed began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2008
Device: Cybook Gen3
Faster than flatbed

Quote:
Originally Posted by chorpler View Post
That is the coolest thing ever. But I wonder -- is it any faster than scanning a book with a flatbed scanner? That's what I've been doing when converting my books to digital format (for personal use only, of course!) for the last 12 years. It takes about a half-hour per 100 pages of book, I would say, for black and white 300DPI scanning. Greyscale ups the time to about 1.5 hours per 100 pages. Naturally, color considerations are an area where a setup like yours, with digital cameras, would be hugely advantageous.
Thanks!

Actually I came up with the idea that something like this must be possible when stuck scanning a book in a flatbed scanner, which took a number of hours. Photographing the pages takes substantially less time than scanning, because the cameras photograph the whole page at once whereas the scanner has to sweep across the whole page. Even if it takes just as much time and effort to turn pages, that near-instant capture will easily speed you up many times over.

As it turns out, turning the pages is actually faster using the book ripper than it is when using a scanner.

So yeah, if you have a current speed of 100 pages/hour, getting that into the 600+ pages/hour range is very possible.

What kind of things are you doing with the scanned images once you have them? OCR or some image-based ebook conversion?
latchkeyed is offline   Reply With Quote
Old 02-02-2009, 05:46 PM   #4
chorpler
Zealot
chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.
 
Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
Amazing. I definitely want to build one of these right now. What kind of megapixel rating do you need for those cameras?

I usually scan my books, then adjust the images and OCR the contents, either in OmniPage or FineReader. Then I correct the OCR, first with the check in OmniPage or FineReader and then by reading through the file in MS Word 2003. Finally, I export the result to an siPDF file (so I still have a copy of the page images in case I ever lose the book) from OmniPage or FineReader, and to HTML (for actual ebook reading and/or conversion to Kindle) from Word 2003.

Man, a sixfold increase in scanning throughput would be incredible! I wonder how it will work for crappy mass market paperbacks though ... even a high-res flatbed scan doesn't do a very good job for those puppies.

Last edited by chorpler; 02-02-2009 at 05:47 PM. Reason: Specified where I export to HTML from
chorpler is offline   Reply With Quote
Old 02-02-2009, 06:10 PM   #5
latchkeyed
Member
latchkeyed began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2008
Device: Cybook Gen3
Mega Pixels

That's great. Let me know if you have any questions about building the book ripper. I think we should have enough documentation on the site but I'm happy to fill in any gaps and would love to hear about someone else's experiences in building and using this.

As for MegaPixels, I find that 5 is generally plenty unless you want to preserve high resolution pictures, or need to capture things like amazingly small and detailed Chinese characters. I have one book on Chinese poetry and the characters are too small for me to read with my naked eyes (20/20 vision), but a 6-8MP picture has enough resolution that I can magnify it up to readable size without losing quality.

ereszet has some great information in the https://www.mobileread.com/forums/sho...279#post106279 "do-it yourself repro v-cradle for paper books" thread. One upshot, and there is a lot of information there, is that megapixel isn't the only determining factor, and things like size of the size of the actual sensor in the camera and the quality of the other camera components matter a great deal.

We've got a small page up summarizing my experiences with a couple cameras and what features make things faster/slower: http://bkrpr.org/doku.php?id=cameramodels

If you have a camera or two now, start with those! the post-holiday/recession sales are only going to get better so you can always decide to pick up some new cameras down the line.

You are right about the paperback books. Ive scanned a couple and they are indeed the most difficult for the book ripper to photograph, mostly just because the book is so much smaller than the device. I'm working on a smaller version, possibly using webcams, that will hopefully make scanning paperbacks even faster.
latchkeyed is offline   Reply With Quote
Old 02-02-2009, 06:21 PM   #6
chorpler
Zealot
chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.
 
Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
I can't quite tell from the pictures (I haven't read any of the instructions yet), but could you make a smaller device for paperbacks and just attach the same cameras that the big version uses?
chorpler is offline   Reply With Quote
Old 02-02-2009, 11:53 PM   #7
latchkeyed
Member
latchkeyed began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2008
Device: Cybook Gen3
paperback

Quote:
Originally Posted by chorpler View Post
I can't quite tell from the pictures (I haven't read any of the instructions yet), but could you make a smaller device for paperbacks and just attach the same cameras that the big version uses?
Is that a lot of what you have to digitize?

It should be possible to use the same cameras, though it will depend a bit on the camera's field of focus. One of the advantages of using webcams is that they expect things to be really close, you you can put things very close to them and they still cover a wide area. Cameras, even if they can focus on objects very close up, still tend to expect things to be somewhat farther away, so the amount of area that they cover for close to the lens shots is much smaller. I just did some quick testing with my cameras and you could cut 3" off either end while leaving yourself an ~7"x6.25" shooting area. That is plenty for the paperbacks I have, but your collection may vary.

I haven't actually built one that small before but it should be possible to just get 2" bolts for the camera mount part and keep everything else pretty much the same.

If you've got some scanning to do, I'd be happy to help you get all the right parts.
latchkeyed is offline   Reply With Quote
Old 02-03-2009, 06:25 PM   #8
chorpler
Zealot
chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.
 
Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
Yes, I have quite a few old and out-of-print paperbacks that I'd like to digitize.

So I would definitely appreciate whatever help you can offer. On Friday I should be getting my tax refund, and I think I just found a good use to put it to!
chorpler is offline   Reply With Quote
Old 02-06-2009, 04:40 PM   #9
latchkeyed
Member
latchkeyed began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2008
Device: Cybook Gen3
Starting out

I'd be happy to help. Take a look at the materials on the site and see if there is anything in particular you'd like to know. The first step is to see if there is a local plastics shop around to make the actual plexi part. Generally there is something around for the hobby crowd and the sign making business. If not, I can help dig up some internet custom shops.

Other than that it is just a question of whether you have one or more small point and shoot cameras. I've been documenting my experiences with various cameras here: http://bkrpr.org/doku.php?id=cameramodels if you need to pick up one or more of them.

What Operating System are you running? We've mostly finished the port to Windows and are testing how easily the software moves to OS X, but it would be useful to know what people are using.
latchkeyed is offline   Reply With Quote
Old 02-09-2009, 06:00 PM   #10
latchkeyed
Member
latchkeyed began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2008
Device: Cybook Gen3
paperbacks

I got interested enough to follow along and should have a smaller device to test tomorrow. http://bkrpr.org/blog/?p=25
latchkeyed is offline   Reply With Quote
Old 02-11-2009, 06:28 PM   #11
latchkeyed
Member
latchkeyed began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2008
Device: Cybook Gen3
Initial testing with the smaller version of the bookripper design (each side 9"x8") has produced some great results. My original calculations turn our to be a little off, the actual shooting area you get is 7.25"x5.25", rather than 7"x6.25", but that is still plenty of room to shoot my paperbacks.

Also, the plastics place that I've gotten all my pieces from does shipping, so if you don't have a local place, you can just use them. They charged me a little under $40 for the 9"x8" version with all the holes pre-drilled. I'd be happy to send you a copy of the diagram that I sent them to get it made.
latchkeyed is offline   Reply With Quote
Old 02-16-2009, 05:01 PM   #12
chorpler
Zealot
chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.
 
Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
That would be awesome. I'm near Las Vegas; I don't know any local plastics places, so I'll probably just use yours. I wonder how much it will be to get it shipped here. Would I need two versions of the plastic frame, one for hardbacks and the smaller one for paperbacks? Sounds like it.

Oh, and I'm using Windows XP, but I also have a couple Linux machines.
chorpler is offline   Reply With Quote
Old 02-19-2009, 03:36 PM   #13
latchkeyed
Member
latchkeyed began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2008
Device: Cybook Gen3
Plastic or plastic?

Quote:
Originally Posted by chorpler View Post
That would be awesome. I'm near Las Vegas; I don't know any local plastics places, so I'll probably just use yours. I wonder how much it will be to get it shipped here. Would I need two versions of the plastic frame, one for hardbacks and the smaller one for paperbacks? Sounds like it.

Oh, and I'm using Windows XP, but I also have a couple Linux machines.
I put up all the information on ordering the plastic piece, including the store I've used and the diagram I've sent them, here: http://bkrpr.org/doku.php?id=hardwareplans

Thanks for the OS feedback, it sounds like we're targeting platforms in the right order for you as well.

As for the multiple pieces of plastic, it depends on you. I've basically scanned everything using the 12" model, including a couple of paperbacks, and had good results. My theory is that if you're doing significant paperback capturing, it will lead to speed improvements but I haven't yet attacked a stack of paperbacks in order to test that out. If you want to try it out, just use the same diagram but change the inside sides (the ones next to the end) to be 9" and the outside edge (with the holes) to 8". That'll give you a copy of the one I picked up last week and we can scan in tandem.
latchkeyed is offline   Reply With Quote
Old 02-21-2009, 05:59 PM   #14
chorpler
Zealot
chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.
 
Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
So, one thing I can't figure out from looking at the site and the sample photos -- what would you say the resolution equivalence is for the devic? I mean, 300dpi scans seem to be good enough to do excellent OCR for me, which produces a file that's roughly 2550x3300 pixels (the vertical pixel count is often lower, of course, since very few books are actually the size of a full 8.5x11" scanning area. On the other hand, TWO pages of a hardback are often a very close 8.5x11" fit, so you end up with one 2550x3300 pixel image covering two pages, or two 2550x1650 images when the scan is split in half (as FineReader does automatically).

2550x3300 correponds roughly to an 8MP camera, but of course with the bookripper device, each page is done separately, so you only need a 4MP camera to get the equivalent. But as you pointed out, megapixel rating is not the only thing that counts...

So what kind of resolution are you able to achieve with your average digital camera, mounted at the distance the bookripper device mounts it at?
chorpler is offline   Reply With Quote
Old 02-23-2009, 01:22 PM   #15
chorpler
Zealot
chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.
 
Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
I found a local plastics company that will make the 12" version for $22, the 14" version for $27.50, and the 9" version for $14. But they're thermally bending the plexiglas to make the 90-degree bend. According to them, this will result in a "point" with an outside diameter of 1.5 times the thickness of the material, which is 1/8" plexiglas that has an actual thickness of 0.110", so it will be 0.165". Is this pointy enough to fit into a book and capture all the text, do you think? Is your plexiglass plate thermally bent, or did they join two pieces somehow?
chorpler is offline   Reply With Quote
Reply

Tags
bkrpr, digitization, photographing, ripping, scanning


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Fast book scanner ShortNCuddlyAm News 1 03-17-2010 06:24 PM
DIY Book Scanner article in Wired sassanik News 3 12-12-2009 02:43 PM
Digitize your own books: The Book Ripper Project anurag News 1 07-23-2009 04:22 PM
DIY High-speed Book Scanner Plans danielreetz Workshop 17 06-25-2009 08:17 AM
DL 3000 Book Scanner Alisa News 4 10-10-2008 08:00 PM


All times are GMT -4. The time now is 06:41 AM.


MobileRead.com is a privately owned, operated and funded community.