Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 10-04-2007, 07:49 AM   #1
user
Connoisseur
user began at the beginning.
 
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
Exclamation how to digitize books

hello

I would like to digitize a book, by taking photos of
the book pages and then performing OCR in them

can you tell me please what characteristics must a
camera have to do this? big zoom? many megapixels?
specific features?

OCR needs a 300dpi scan from a scanner, so can you
tell me please which is the equivalent for a digital
camera photo? I mean how many megapixels and which
distance from the source, how much lighting etc

any specific settings of the camera? does the room
need to be very lighted? do I need a tripod? and
specific add-ons to the camera? any software?
any suggestion would be much appreciated

also these book scanners use cameras:
kirtas-tech.com
atiz.com
and their scan samples are marvelous

can I reproduce these results? with cheaper way

thanks
user is offline   Reply With Quote
Old 10-04-2007, 10:51 AM   #2
BKeeper
Groupie
BKeeper will become famous soon enoughBKeeper will become famous soon enoughBKeeper will become famous soon enoughBKeeper will become famous soon enoughBKeeper will become famous soon enoughBKeeper will become famous soon enoughBKeeper will become famous soon enough
 
BKeeper's Avatar
 
Posts: 174
Karma: 728
Join Date: Jul 2006
Device: Cybook, iPad
It depends on the size and distance of the source. generally you'll get good results starting with 8 MP. (but I'd go for 12)

It will be good if your camera has some kind of auto-shutter function, so that you can program a fixed interval.

Getting the lighting right is kinda tricky.
Also you'll need a capable OCR. FineReader and Omnipage 16 have options to perform OCR on digital camera pictures, (correct perspective, distortion, and lighting...)

Keep in mind that using a digital camera, OCR results won't be as acurate.

If you can afford to do destructive scanning, then I'd advise you to get a sheet-fed scanner (check kodak and fujitsu), you'll get much better results.

If you still want to go with your digital camera then check this thread It has exactly what you need.

Hope this helps
BKeeper is offline   Reply With Quote
 
Enthusiast
Old 10-04-2007, 12:01 PM   #3
nekokami
fruminous edugeek
nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.
 
nekokami's Avatar
 
Posts: 6,745
Karma: 551260
Join Date: Oct 2006
Location: Northeast US
Device: iPad, eBw 1150
You might want to consider using a flat sheet of glass or thick plastic to hold the pages very flat while you photograph them, so you get less distortion from curved pages.
nekokami is offline   Reply With Quote
Old 10-05-2007, 01:42 AM   #4
user
Connoisseur
user began at the beginning.
 
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
thanks for your replies

would it be better to shoot with a film camera, then print the photos from film and then shetfeed the scanner with them?

much more time and cost, but will it be better?
user is offline   Reply With Quote
Old 10-05-2007, 08:38 AM   #5
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,738
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
Quote:
Originally Posted by user View Post
hello

I would like to digitize a book, by taking photos of
the book pages and then performing OCR in them

can you tell me please what characteristics must a
camera have to do this? big zoom? many megapixels?
specific features?
Check ABBYY Finreader at www.abbyy.com. Their version 8.0 has that ability included & may have recommendations for camera & TECHNIQUE.
slayda is offline   Reply With Quote
Old 10-05-2007, 08:45 AM   #6
user
Connoisseur
user began at the beginning.
 
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
I contacted them for recommendations over camera + technique with no result, anyone with better luck?
user is offline   Reply With Quote
Old 10-05-2007, 09:10 AM   #7
Steven Lyle Jordan
Grand Sorcerer
Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.
 
Steven Lyle Jordan's Avatar
 
Posts: 8,482
Karma: 5171130
Join Date: Jan 2006
Device: none
Quote:
Originally Posted by user View Post
thanks for your replies

would it be better to shoot with a film camera, then print the photos from film and then shetfeed the scanner with them?

much more time and cost, but will it be better?
It would be cheaper, faster and more effective than camera if you took the book to a good photocopy machine. They already output the image on 8.5x11 or A4 paper, already suited for sheetfed scanners, and cost less than photo output to film (or even paper).

If you go this route, use a photocopier with a zoom control. Increase the zoom until your book page literally fills the photocopy image. Then you'll have the largest-possible text images on paper, which will run perfectly through a sheetfed scanner, be easier for the OCR to recognize, and reduce your reco errors.

Copying the book page by page will also be faster than doing the same with a camera, then outputting the camera image.
Steven Lyle Jordan is offline   Reply With Quote
Old 10-05-2007, 11:50 AM   #8
NatCh
Gizmologist
NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.NatCh ought to be getting tired of karma fortunes by now.
 
NatCh's Avatar
 
Posts: 11,605
Karma: 926222
Join Date: Jan 2006
Location: Republic of Texas Embassy at Jackson, TN
Device: Nook STGR
Even in the digital age, sometimes the old ways are best, eh, Steve?
NatCh is offline   Reply With Quote
Old 10-05-2007, 01:24 PM   #9
Steven Lyle Jordan
Grand Sorcerer
Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.
 
Steven Lyle Jordan's Avatar
 
Posts: 8,482
Karma: 5171130
Join Date: Jan 2006
Device: none
'Fraid so! I've never found a faster, easier and more accurate way to scan text than this. The best part is, it breaks up the job into stages... assembly-line, as it were... making the entire process easier to manage.
Steven Lyle Jordan is offline   Reply With Quote
Old 10-05-2007, 02:14 PM   #10
user
Connoisseur
user began at the beginning.
 
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
ok but photocopying the book is the same as scanning it, isnt it?
user is offline   Reply With Quote
Old 10-05-2007, 02:18 PM   #11
ereszet
Zealot
ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.
 
ereszet's Avatar
 
Posts: 118
Karma: 306
Join Date: Sep 2007
Device: Sony PRS-500 Archos 704 wifi
Quote:
Originally Posted by Steve Jordan View Post
It would be cheaper, faster and more effective than camera if you took the book to a good photocopy machine. They already output the image on 8.5x11 or A4 paper, already suited for sheetfed scanners, and cost less than photo output to film (or even paper).

If you go this route, use a photocopier with a zoom control. Increase the zoom until your book page literally fills the photocopy image. Then you'll have the largest-possible text images on paper, which will run perfectly through a sheetfed scanner, be easier for the OCR to recognize, and reduce your reco errors.

Copying the book page by page will also be faster than doing the same with a camera, then outputting the camera image.
It will be cheaper only if you use an office copier for your private copying (no investment and no running cost for you). It will be faster only if your secretary does the copying. Your workflow will not reproduce color images well enough, even with a color copier. Increasing the zoom beyond a certain limit will spoil the OCR rather than improve it. The advice from Finereader is not to manipulate the images unless you have to. If you flatten a book with the copier cover you get curved lines of text and you damage the book to some extent.

With my camera I can take photos of documents every 3 seconds or so. No copier can match that. Results are good enough for OCR. High quality repro requires a little more than a camera. See my thread http://www.mobileread.com/forums/showthread.php?t=13848
ereszet is offline   Reply With Quote
Old 10-05-2007, 03:14 PM   #12
Steven Lyle Jordan
Grand Sorcerer
Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.
 
Steven Lyle Jordan's Avatar
 
Posts: 8,482
Karma: 5171130
Join Date: Jan 2006
Device: none
Quote:
Originally Posted by user View Post
ok but photocopying the book is the same as scanning it, isnt it?
No: Photopying (aka "Xeroxing") puts the book pages onto standard paper ready for sheetfed scanners.
Steven Lyle Jordan is offline   Reply With Quote
Old 10-05-2007, 03:23 PM   #13
Steven Lyle Jordan
Grand Sorcerer
Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.
 
Steven Lyle Jordan's Avatar
 
Posts: 8,482
Karma: 5171130
Join Date: Jan 2006
Device: none
Quote:
Originally Posted by ereszet View Post
It will be faster only if your secretary does the copying.
"Faster for who?" the secretary opined.

Quote:
Originally Posted by ereszet View Post
Your workflow will not reproduce color images well enough, even with a color copier.
Excuse me... I thought we were talking about text.

Quote:
Originally Posted by ereszet View Post
Increasing the zoom beyond a certain limit will spoil the OCR rather than improve it.
Generally, a zoom of only about 130% is enough to fill a letter or A4 page. That doesn't spoil OCR.

Quote:
Originally Posted by ereszet View Post
With my camera I can take photos of documents every 3 seconds or so. No copier can match that.
Check out some modern high-speed copiers. A lot of them can match that, and are only slowed up by the rate at which you can change the page.

(Not trying to bust your chops. Just being fair.)
Steven Lyle Jordan is offline   Reply With Quote
Old 10-05-2007, 05:07 PM   #14
ereszet
Zealot
ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.
 
ereszet's Avatar
 
Posts: 118
Karma: 306
Join Date: Sep 2007
Device: Sony PRS-500 Archos 704 wifi
[QUOTE=Steve Jordan;103351
Excuse me... I thought we were talking about text. [/QUOTE]

Books come with images, photos and maps. A disadvantage of Gutenberg project is that it is limited to text only. I have a collection of thousands of pdf/djvu books and maps coming from free digital libraries that look exactly like originals. That is also what I do with my documents/ books/ business cards, magazines, newspaper clips, etc. by photoscanning. Then I have to process them to remove whatever is wrong due to my not taking proper care at the photoscanning stage and OCR them to index.

For your info: just one of my folders contains over 5 thousand documents with over 5 million word count. The size of the folder is 30 Gb and the size of the index is 500 Mb. In total my collection of indexed books is close to 100 Gb.

Text alone is too easy to scan or photocopy to worry about it too much. In practice there are no lighting problems, just a steady hand and a good focus.
ereszet is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
I want to digitize my paper books. llwwss Workshop 56 09-02-2010 03:49 AM
Digitize your own books: The Book Ripper Project anurag News 1 07-23-2009 04:22 PM
Random House to digitize thousands of books DonaldL. News 34 12-04-2008 08:39 AM
Bookshelf reduction: To digitize or not to digitize vivaldirules Lounge 15 12-06-2007 07:00 PM
How to digitize a million books Bob Russell Workshop 0 03-01-2006 06:10 PM


All times are GMT -4. The time now is 06:05 PM.


MobileRead.com is a privately owned, operated and funded community.