10-04-2007, 07:49 AM | #1 |
Connoisseur
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
|
how to digitize books
hello
I would like to digitize a book, by taking photos of the book pages and then performing OCR in them can you tell me please what characteristics must a camera have to do this? big zoom? many megapixels? specific features? OCR needs a 300dpi scan from a scanner, so can you tell me please which is the equivalent for a digital camera photo? I mean how many megapixels and which distance from the source, how much lighting etc any specific settings of the camera? does the room need to be very lighted? do I need a tripod? and specific add-ons to the camera? any software? any suggestion would be much appreciated also these book scanners use cameras: kirtas-tech.com atiz.com and their scan samples are marvelous can I reproduce these results? with cheaper way thanks |
10-04-2007, 10:51 AM | #2 |
Groupie
Posts: 186
Karma: 728
Join Date: Jul 2006
Device: Kindle PW
|
It depends on the size and distance of the source. generally you'll get good results starting with 8 MP. (but I'd go for 12)
It will be good if your camera has some kind of auto-shutter function, so that you can program a fixed interval. Getting the lighting right is kinda tricky. Also you'll need a capable OCR. FineReader and Omnipage 16 have options to perform OCR on digital camera pictures, (correct perspective, distortion, and lighting...) Keep in mind that using a digital camera, OCR results won't be as acurate. If you can afford to do destructive scanning, then I'd advise you to get a sheet-fed scanner (check kodak and fujitsu), you'll get much better results. If you still want to go with your digital camera then check this thread It has exactly what you need. Hope this helps |
Advert | |
|
10-04-2007, 12:01 PM | #3 |
fruminous edugeek
Posts: 6,745
Karma: 551260
Join Date: Oct 2006
Location: Northeast US
Device: iPad, eBw 1150
|
You might want to consider using a flat sheet of glass or thick plastic to hold the pages very flat while you photograph them, so you get less distortion from curved pages.
|
10-05-2007, 01:42 AM | #4 |
Connoisseur
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
|
thanks for your replies
would it be better to shoot with a film camera, then print the photos from film and then shetfeed the scanner with them? much more time and cost, but will it be better? |
10-05-2007, 08:38 AM | #5 | |
Retired & reading more!
Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
Quote:
|
|
Advert | |
|
10-05-2007, 08:45 AM | #6 |
Connoisseur
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
|
I contacted them for recommendations over camera + technique with no result, anyone with better luck?
|
10-05-2007, 09:10 AM | #7 | |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
Quote:
If you go this route, use a photocopier with a zoom control. Increase the zoom until your book page literally fills the photocopy image. Then you'll have the largest-possible text images on paper, which will run perfectly through a sheetfed scanner, be easier for the OCR to recognize, and reduce your reco errors. Copying the book page by page will also be faster than doing the same with a camera, then outputting the camera image. |
|
10-05-2007, 11:50 AM | #8 |
Gizmologist
Posts: 11,615
Karma: 929550
Join Date: Jan 2006
Location: Republic of Texas Embassy at Jackson, TN
Device: Pocketbook Touch HD3
|
Even in the digital age, sometimes the old ways are best, eh, Steve?
|
10-05-2007, 01:24 PM | #9 |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
'Fraid so! I've never found a faster, easier and more accurate way to scan text than this. The best part is, it breaks up the job into stages... assembly-line, as it were... making the entire process easier to manage.
|
10-05-2007, 02:14 PM | #10 |
Connoisseur
Posts: 78
Karma: 10
Join Date: Oct 2007
Device: Benq P51
|
ok but photocopying the book is the same as scanning it, isnt it?
|
10-05-2007, 02:18 PM | #11 | |
Zealot
Posts: 118
Karma: 306
Join Date: Sep 2007
Device: Sony PRS-500 Archos 704 wifi
|
Quote:
With my camera I can take photos of documents every 3 seconds or so. No copier can match that. Results are good enough for OCR. High quality repro requires a little more than a camera. See my thread https://www.mobileread.com/forums/showthread.php?t=13848 |
|
10-05-2007, 03:14 PM | #12 |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
|
10-05-2007, 03:23 PM | #13 | |||
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
"Faster for who?" the secretary opined.
Quote:
Quote:
Quote:
(Not trying to bust your chops. Just being fair.) |
|||
10-05-2007, 05:07 PM | #14 |
Zealot
Posts: 118
Karma: 306
Join Date: Sep 2007
Device: Sony PRS-500 Archos 704 wifi
|
[QUOTE=Steve Jordan;103351
Excuse me... I thought we were talking about text. [/QUOTE] Books come with images, photos and maps. A disadvantage of Gutenberg project is that it is limited to text only. I have a collection of thousands of pdf/djvu books and maps coming from free digital libraries that look exactly like originals. That is also what I do with my documents/ books/ business cards, magazines, newspaper clips, etc. by photoscanning. Then I have to process them to remove whatever is wrong due to my not taking proper care at the photoscanning stage and OCR them to index. For your info: just one of my folders contains over 5 thousand documents with over 5 million word count. The size of the folder is 30 Gb and the size of the index is 500 Mb. In total my collection of indexed books is close to 100 Gb. Text alone is too easy to scan or photocopy to worry about it too much. In practice there are no lighting problems, just a steady hand and a good focus. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
I want to digitize my paper books. | llwwss | Workshop | 56 | 09-02-2010 03:49 AM |
Digitize your own books: The Book Ripper Project | anurag | News | 1 | 07-23-2009 04:22 PM |
Random House to digitize thousands of books | DonaldL. | News | 34 | 12-04-2008 08:39 AM |
Bookshelf reduction: To digitize or not to digitize | vivaldirules | Lounge | 15 | 12-06-2007 07:00 PM |
How to digitize a million books | Bob Russell | Workshop | 0 | 03-01-2006 06:10 PM |