View Full Version : Simplest scanning methods and equipment


Bob Russell
08-14-2006, 03:21 PM
Let me ask about a topic that I think a lot of ebook fans are curious about, but I'm not sure anyone has a good answer for...

Suppose we want to scan a book that is not under copyright. It could be a hardback, paperback, or a binder of printed 8.5x11" pages. Assume the following:

* We want the whole process to be easy and fast and cheap, but mostly easy and not so slow that it's not worth bothering.
* We want to spend a minimal amount of money on scanners or software.
* We don't care much about quality. Assume it's for personal reading only.
* A reasonable amount of OCR errors is tolerable as long as one can still read the text without too much trouble. So it's okay if "cl" sometimes becomes a "d", or vice versa, and so forth.
* The book has a fairly standard layout and a fairly normal font style.
* We are willing to destroy the book if necessary (unbind, cut, etc)
* No pictures or formatting are required, just the text for reading on a small screen device.

I realize there may be multiple answers depending on whether or not the book is destroyed, or based on the size of the book.

Edited to make the question more evident...
Has anyone already got a good answer for [how to do] this, or do we all need to start from scratch if we decide to jump into such a project?

BTW, the copyright issue may limit some of us to old books that we don't want to destroy. Any answers for the case we don't want to damage the book? Or suppose we are willing to accept images instead of characters, such as a pdf scan of a math book to keep diagrams and symbols, etc. Any different solutions now?

ath
08-14-2006, 03:59 PM
I'm not sure I found the question in here -- it mostly seemed to be a set of conditions.

Let me ask about a topic that I think a lot of ebook fans are curious about, but I'm not sure anyone has a good answer for...

You can't have both easy and fast and cheap, I don't think.

If you want it very easy, go to a scanning bureau (fast, but not necessarily cheap). Or buy a good sheet-fed scanner ... but the good and fast ones are rarely cheap.

If you want it cheap, it's going to be a lot of work (cut up the book, scan page by page in cheapest scanner you find -- if book is small enough to fit on the scanning platen, cutting may not be necessary.) Page feeder is an option, but I don't trust the cheap ones: they tend to either crash the paper, or misfeed. And cheap scanning software tends to be a serious pain.

I've done hand scanning (HP IIP & 3C), cheap sheet feed (HP 5550C -- not recommended for large jobs, particularly not needing sheet feeding), and somewhat more expensive sheet feed (Fujitsu fi-4120). I still do hand scanning from time to time.

If you don't want to cut up the book or otherwise mistreat it, and still use a flatbed scanner, forget it. Hand-scanning it will stress the binding so much that you will probably damage the book in one way or another. The only way to do this is by using an overhead scanner, but they tend to be quite expensive.

There have been some tests with using digital cameras as scanning devices. That is a kind of improvised overhead scanner -- see http://runeberg.org/admin/camera.html . However, for best results you probably need to do single page images, rather than page spreads.

Oh, wait ... I forgot. There's actually one more way. Try http://www.archive.org/ Someone may have done the job already. Just grab the page images, and feed them to whatever program will create the final document. Some works have already been OCR'd.

If you don't particularly want page images, but only want the text in reasonably good shape, also try looking for it at http://digital.library.upenn.edu/books/ . They also have pages with lots of copyright-related information.

Bob Russell
08-14-2006, 04:14 PM
You're absolutely right, ath, that's probably a bit open-ended. And there are definitely tradeoffs. I'll try to zoom in a bit more to a specific likely scenario. (And I also emphasized my question in the original post, so it doesn't get lost in the verbage!)

Let's say we want to be able to do it with hardware/software combos in the $100-$600 range (not including the PC, of course) . And let's say that we don't want to spend more than 3-4 "attended" man hours on the whole operation... prepping, scanning, OCR and creating the text file for a 500 page book.

Any unattended time, say overnight unattended processing, would not count in the time requirement.

Is this possible, or still a bit out of the reach of current technology?

ath
08-14-2006, 05:33 PM
Let's say we want to be able to do it with hardware/software combos in the $100-$600 range (not including the PC, of course) . And let's say that we don't want to spend more than 3-4 "attended" man hours on the whole operation... prepping, scanning, OCR and creating the text file for a 500 page book.

Doing 500 pages by hand means either 500 pages or 250 page spreads at flatbed scanner speed. (With a old and sluggish scanner, I do 3 spreads per minute, 300 dpi, b/w, so perhaps 2 hours, spread over four sessions: after about 30 minutes, fumbling tends to increase. That means that some pages either get scanned twice or not at all.) Cutting up may not be required, if the scanner is large enough.

Cutting up would mean about 10-15 minutes cutting up the book (Fiskars roller knife & an iron straightedge), and then it's up to the scanner speed for the rest. (Putting the pages in press for a day or two makes feeding easier. My fi-4120 - sheetfed - would do it with about 5 misfeeds in 30-40 minutes, but it's out of range for the proposed budget.)

For creating a document, I'd use ABBYY FineReader Pro. Can do both OCR, or just repackage image files as a image PDF. (I'm told Omnipage can do the job, but I've never had much success with that program myself. YMMV, as always). As long as there's not going to be any serious proofreading, FineReader will do around one page / second on a modern system, and as long as image resolution is 600 dpi it won't do uncomfortably many misreads. (THis would count as 95% unattended, I think.)

I think it's possible ... perhaps not at $100, but around $200, 250 or so.

Price and market for scanners is difficult, and I have very little idea of the situation in the US (as you mentioned $, I assume US). For the money, I believe I would have a better chance getting a good flatbed than an equally good sheetfed scanner (more exposed mechanics). I'd check the compatibility list for FineReader to ensure I can scan straight into the program, preferrably through WIA drivers (Twain often seems to be dumbed-down, I find), and start reading reviews.

Liviu_5
08-14-2006, 11:31 PM
Hi,

I use OpticBook scanner basic version (~250$ comes with Abby). At 300 dpi, b&w (which works much better for ocr than greyscale), jpg/tif, I do roughly 10 pages ( 5 dp sheets) per minute for a hc, 14 pages for a pb, watching a movie while scanning.

The results are pretty good, very few black spots or shadows as long as you press the book properly (depends from book to book and from where you are in the book). The book is not damaged in any way by scanning.

I always count 10' per 100 pages for scanning a hc/tp. After that, the ocr is done by the pc so no time wasted.

Personally I complement the ocr with a transformation of the scan in Nokia 770 size (480x800) jpg's page to screen; using 2 free windows programs (xnview, rename master for cropping, renaming, resizing) and a html template for embedding the jpg's so are readable by Fbreader, I do a scanned book in about 20-30' independent of size since everything is mostly a matter of organization. As a jpg/html it takes about 35kb/page at appropriate image quality, and it's very nicely and fast readable on Nokia. I complement reading the image book with the ocr, but overall I prefer the image reading on Nokia and the ocr on Ebks1150.

Liviu

ElaHuguet
08-16-2006, 04:47 AM
How much does the Lego-block scanner (http://www.geocities.jp/takascience/lego/fabs_en.html) cost? That takes a lot of man-hours from the scanning part of the project, I know (which you could then spend part of on OCRing more carefully). :)

Bob Russell
08-17-2006, 09:59 AM
I'm really impressed with the OptiBook 3600 Scanner (http://www.plustek.com/products/book.htm) solution with the Abby OCR software included. If it's really about $250, I have to wonder if there's any other solution that can compare for capability and price.

Anyone else have an comparable alternative?

Thanks to those who have posted some great info in this and other threads like...
http://www.mobileread.com/forums/showthread.php?t=6478, and
http://www.mobileread.com/forums/showthread.php?t=7329

Bob Russell
09-07-2006, 01:14 PM
I wonder about the possibility to simply take pictures with some of the new low-priced, but high resolution digital cameras. I know it's expensive if you have a top of the line digital camera and an automatic page turner, but maybe it's practical with a decent camera and some OCR software (who knows, maybe even the free software released recently by Google?).

Even camera phones are supposed to be enough resolution for the ScanR free OCR service, which sends back emails after you send them a picture of text.

Anyone doing this currently?

bowerbird
09-25-2006, 01:07 PM
there's no better combination than the optic3600 and finereader.
if you want to do the scanning and the o.c.r. yourself, that is...

but since you specified an "out of copyright" book, you should
look around cyberspace to see if it has already been scanned.
google now has 100,000+ done, and is doing more every day.

if you can get the scans from somewhere, then the easiest
thing to do is to wrap them into a .pdf and just start reading.

of course, the text from such a "book" cannot be _searched_,
or _copied_, or _resized_ for greater readability, nor can it be
_reflowed_ so as to better fit varying screensizes. but if all that
doesn't bother you, then there's no reason to do any more work.

and remember that if you got the scans from google, you can
always return to google whenever you want to search the book.
plus, if you want to copy the text from a page or two, you can
o.c.r. just those page-images; you don't need the whole book.
so you might be able to live comfortably with those limitations.

but if you do want to do the o.c.r., you should know that it is
_not_ that difficult to clean the results and format the e-book.

i'll be posting some messages to the "bookpeople" listserve
this week that walk people through the process with an actual
scan-set that i downloaded from google. not only that, but
the university of michigan is now posting the o.c.r. _results_
on their site, so you can scrape their actual o.c.r. output too,
which means that you don't even have to do the o.c.r yourself.

when i post my messages, i'll come here and give you the url's.

-bowerbird

Bob Russell
09-25-2006, 01:41 PM
Thanks bowerbird!

Someone mentioned that the inner page margins in many books are too small to get a good scan, didn't they? Have you had that problem also? Which books? How much margin do you need?

ath
09-26-2006, 12:23 PM
of course, the text from such a "book" cannot be _searched_,
or _copied_, or _resized_ for greater readability, nor can it be
_reflowed_ so as to better fit varying screensizes. but if all that
doesn't bother you, then there's no reason to do any more work.


ABBYY (http://www.abbyy.com/press/press_releases.asp?param=59894) has something called PDF Transformer 2 Pro: sounds as if it
is just the thing for people who don't want to do anything else
but make scanned image PDF's searchable.

Don't know how well it works myself -- just saw the press release.

bowerbird
10-03-2006, 04:00 AM
abbyy's transformer is pc-only, so i've not been able to try it,
but it certainly struck me as a worthwhile piece of software,
since (if my memory serves correctly), it was just 50 bucks.

-bowerbird

p.s. oops. new version, with a new price -- now $100,
which takes it out of the league of "hey what can it hurt?"
and into the league of "this had better work". however...
since they offer a free demo version, why not try it out?

bowerbird
10-03-2006, 04:04 AM
oh yeah, i've been posting messages to the bookpeople listserve
about my experiment to scrape o.c.r. text-files from umichigan
and transform 'em into an electronic-book. go to the 2006 index:
> http://onlinebooks.library.upenn.edu/webbin/bparchive
and search for "feedback to umichigan" for my series of posts,
which will conclude with "part 7" in the next day or two...

-bowerbird

Bob Hoswell
10-06-2006, 07:59 PM
Hi everybody,

I am a new member and an old hand at a OCR , have been doing it sends1994 , started with OmniPage direct, then TextBridge pro, OmniPage 11, recently discovered ABBYY, software and have found it to be the best OCR program. Yet. And at just bought my fourth scanner, The comments about damaging books, while scanning them on a flat ed scanner, I have encountered many times I have found no solution. If you too hard on the book, your wrist, damaging it. But if you do not press down hard, you end up with shadows on the image, in the posting. There is mention of this new scanner that can scan books were about damage and them and no shadows. Anybody had been using the scanner and what are the results. You mentioned in the post hand scanners, and they still around?.

Regards,

Bob

ath
10-07-2006, 03:51 AM
There is mention of this new scanner that can scan books were about damage and them and no shadows. Anybody had been using the scanner and what are the results.

The idea is fairly old -- Xerox used to have a scanner like that, where the scanning area went all the way out to the edge of the scanner, and allowed books to 'hang' off the edge, while the inside page was scanned.

The bonus is that gutter effects disappear, and pages scan flat, even when the book is very stiffly bound. That is a considerable advantage.

The disadvantage is that you will be doing twice as many scanning moments, and as each of them carry a risk for folding a page, that risk increases. Scanning time increases also, but that may perhaps be offset by better OCR results, and less correction work later.

And of course there are always books where the binding is slipping or deteriorating: they won't stand even this much handling.

Studio717
10-07-2006, 05:29 PM
I bought an Opticbook scanner a few months ago and love it. The biggest drawback for me is that the software is Win only and I'm mostly a Mac person. I was able to drag out an old Dell laptop and I use that right now, though a MacTel system is definitely in my future.

The combo of the design of the scanner and the software works very well, imo. There are buttons on the scanner itself that set up the software and preview, and separate buttons for the kind of scan you want: color, greyscale, or b & w. This makes the scanning go much faster because I'm not having to click on the computer, just press the buttons on the scanner. (I do set up folder, file prefex, etc., on the computer first.)

The downside is that one has to scan only a page at a time to get that nearly-spine-shadow-free scan. The upside of that is that the software will auto rotate every other page so all pages are right side up. Very nice.

(For books that aren't too fragile, I still use my Epson Perfection scanner because the software does a great job of scanning two pages into two separate files with one scan. The Opticbook software may also do this, but I haven't investigated it.)

I scan as tiff files, then pull them in Adobe Acrobat 7 (on my Mac) and do an OCR from there. (Usually - I have some scanned books that are from the 18th century and consequently use a long 's' which totally screws up the OCR. There's also the issue of thin paper (19th century books tend to have thicker paper which is a dream to scan, but 18th century books often have nearly see-through paper) and occasionally I have to use black paper inserts, which does add to the scanning time. :( )

If anyone has any questions, feel free to ask. :)

Bob Russell
10-07-2006, 06:25 PM
I do have some questions. Actually lots of questions!...

1) Has anyone got a good inexpensive method of scanning paperbacks quickly? It seems that if you are willing to tear the pages out (I assume that isn't too hard with an exacto knife if you don't care about destroying the book), you should be able to scan quickly with an auto sheetfeeder. That would be useful if there's a good ocr program that can handle odd page scans and then followed by all the even page scans, and deal gracefully with crooked pages or rescans.

Has anyone found a way to do this?

2) Which Epson Perfection scanner do you have, Studio717? Is it big enough for hardbacks, 2 pp per scan? How fast does it do a pair of pages, and what resolution do you use? Does it have any OCR sw, or do you use Abby from the Optibook for that?

ath
10-08-2006, 06:33 AM
1) Has anyone got a good inexpensive method of scanning paperbacks quickly? It seems that if you are willing to tear the pages out (I assume that isn't too hard with an exacto knife if you don't care about destroying the book), you should be able to scan quickly with an auto sheetfeeder. That would be useful if there's a good ocr program that can handle odd page scans and then followed by all the even page scans, and deal gracefully with crooked pages or rescans.

If FineReader is part of your workflow (i.e. for creating OCR'ed text or PDF from image files), tell it to scan odd pages forwards, and even pages backwards. (It's the 'Ask for page number before adding page to batch' option, then select 'odd and even separately'.)

As long as there's no double feed, everything works, but if it does scan two pages as one, it's usually rather messy to fix things up again as a number of odd or even pages have to be renumbered to make the missing page fit the scan sequence.

Moonraker
10-08-2006, 11:54 AM
I have two flatbed scanners -- An Opticbook 3600 and an Epson Perfection 1670. Both are fast as flatbed scanners go. Obviously you need to have USB 2 on your PC -- USB 1.1 is too damn slow.

Scanning on a flatbed don't take me much time - its the proofreading afterwards that is time consuming. This has to be done even if you have an expensive and fast sheet feeder scanner. I scan paperbacks two pages at a time using Abbyy Fine Reader 8. (I find the Abbyy FR Lite that often comes free with a new scanner totally useless). Whether you scan in portrait or landscape mode Abbyy is clever enough to sort out the orientation. Spell-checking and scanning errors are very easy to correct using the tools that come with Abbyy FR.

You can, of course, scan straight to PDF. This is a very quick way of producing ebooks but with one problem -- the resulting PDF file will display very nicely on a PC but I find the result too small to read on any ebook reading device. I prefer to OCR, proofread, and set a pagesize and fontsize to suit my reader. If anyone knows how to scan PDFs to a smaller page size / larger font size please let me know because this really is the fastest way.

I scan at 300 dpi, greyscale and set the scanner so that it scans the area of the book only. It's a waste of time if the scanner beam moves over the whole A4 bed.

The OpticBook 3600 is great for hardbacks - but limited for paper backs because it needs a minimum 6mm gutter on the book which paperbacks do not usually have. It is possible of course, to use the OpticBook in the same way as any other flatbed scanner - two pages at a time - which necessitates pressing down on the spine of the book to ensure it is as flat to the glass as possible.

Bob Russell
10-08-2006, 12:47 PM
It is possible of course, to use the OpticBook in the same way as any other flatbed scanner - two pages at a time - which necessitates pressing down on the spine of the book to ensure it is as flat to the glass as possible.So for paperbacks without sufficient margin, is OpticBook just as good as other flatbed scanners when used two pages at a time (with the included software)? And big enough for most paperbacks 2pp/scan?

Moonraker
10-08-2006, 04:10 PM
You can certainly use an Opticbook 3600 scanner for scanning two pages at a time using Abbyy Fine Reader software. The result will be equally good as using an ordinary flatbed scanner.

I am doubtful whether this is possible using Plustek's Book Pilot software that comes with the Opticbook. I believe it can only manage one page at a time.
This is where you need the minimum 6mm gutter on your book because half the book hangs over the side. The scanner beam can not read right up to the edge of the glass but can read 6mm in from the edge..

Also, I don't believe that the Book Pilot software is an OCR programme. I think it will scan your book into images which then have to be sent to an OCR programme such as AFR. I think Plustek bundle AFR 5 with their scanner - which, quite frankly, is not up to the job. You need AFR 7 or 8 for professional results.

The Opticbook scanner surface is about the same size as a regular A4 scanner so it is large enough for the majority of books.

I use AFR with both my scanners. I used Plustek's own software very briefly when I first got the Opticbook. I found it tiresome to use compared with Abbyy FR so I gave up on it.

I am busy using AFR at the moment but later I will try Plustek's software again and report back here.

Bob Russell
10-08-2006, 04:27 PM
Fantastic info, Moonraker. Thanks much! If you have the opportunity to get back with that next report, I look forward to the additional info on Plusteks softare also.

Moonraker
10-09-2006, 12:23 PM
I am attaching my experience using Plustek's software in an rtf file — it's two pages — too long I feel to paste here.

I must add that this reflects only my own experience — others may report differently.

Bob Russell
10-09-2006, 02:41 PM
Fantastic work, Moonraker! Very interesting to hear more about the process.

One thing I don't understand, though, is whether you can send all the tif page images to the included AFR, or if you need AFR 8 Pro? Wouldn't it be sufficient to do the scanning and the use the included version of AFR for the rest?

And a thought for launching Action Express... if it won't launch because of thinking a window is already open, maybe you can kill a rogue task in the task manager?

Moonraker
10-09-2006, 03:24 PM
You don't need AFR 8 Pro. You could just install AFR 5 (Sprint?) that comes with the OpticBook scanner. It would be installed as a separate application - it does not automatically get installed with the Plustek Software. It would work in the same way as I have described with AFR 8 but not so well. I think users would be very disappointed with its capabilities.

Re your Task Manager comment -- I did not think of that -- if I ever use the Plustek software again I will bear it in mind. Thank you.

gdxf
10-09-2006, 04:40 PM
Opticbook 3600 scans a full A4/Letter sized page in 7-8 seconds, either in black and white monochrome 300dpi, grey or color mode, the fastest flatbed scanner I've been using so far under $250. I've been using Epson 1660 and 1670, which are about the same speed, 11-12 seconds a page for 300dpi A4 b/w. In my experience, Epson 1660 and 1670 are only second to Opticbook 3600 among fast speed low-cost scanners, with an average price below $100 and easy to carry within a backpack. But Opticbook 3600 certainly has the advantage of zero-edge scanning and auto-rotating pages while scanning. I wish Plustek could further improve the speed of their Opticbook models, for example, a speed of 2-3 seconds per A4 page would be perfect. That theoretically means a 300-paged book can be scanned within 10 minutes. For splitting the A4 sized tiff page into two A5 pages automatically, I use Photoshop macro function. I have tried Abbyy Finereader auto dual-page separator, but the problem with it is that it often fails to split the page evenly and it doesn’t split the blank page, which could result in undesired pdf file creation.

Studio717
10-10-2006, 01:57 PM
My Epson scanner is the Perfection 3200 Photo (it's a couple of years old, at least) and does a beautiful job. It's fast - though I've never timed it - with both USB 2.0 and Firewire. The software works well on my Mac, so that's the scanner I usually favor when scanning anything other than books.

I scan mostly as part of a research project so I can have searchable text. My process is a little different: I usually scan in B & W at 300dpi, not greyscale. (I find the smaller files easier to use.) The text comes in clear and crisp and the OCR in Acrobat has no trouble with it. I do occasionally scan greyscale, but that's for photos, illustrations, etc. Very rarely old books will have tipped-in colored images and those, naturally, I scan in color.

I can't speak to scanning paperbacks (entire books) because I haven't done it. I have heard about duplex scanners (Fujitsu?) but have no personal experience with one. In that case, I'd take a book to Kinko's (or equivalent) and have them cut off the spine for a cleaner, more even cut. (I have scanned pages from paperbacks just fine with the Opticbook, both mass market and trade sized with no problems. The margins seem wide enough.)

OCR has limits with dpi (it doesn't like higher resolutions), so I tend to keep it in a 150-300 dpi range, with 300 being my 'default'. (Images I'll do at a higher res so I can study them close up.)

I tend to scan to TIFF because I prefer having a file I can manipulate if I need to, then transfer the files to my Mac using Bluetooth (just because it's easy; wifi works just as well and is probably faster), then pull into Acrobat 7 on my Mac. (I had used Readiris 9 before and while the OCR was quite good, that version at least had filename length issues on the Mac.)

For the odd bits I can't OCR, I'll enter keywords, etc., so the file will have some searchability. (I use a combo of Devonthink and Spotlight to find information.)

(My most frustrating experience was with an old journal that had been rebound more than once and, consequently, had NO gutter whatsoever. I couldn't even open the book more than a third, so my solution in that case was to read it in via "Naturally Speaking." :rolleyes5 Yeah, crazy, but I got the needed info. :happy2: )

Studio717
10-10-2006, 02:05 PM
I should clarify re: the Epson software scanning two pages. The software does one scan, but you can specify 'zones' which are then scanned, in order, to separate files. No second step is needed. It's a very nice feature, imo.

Both the Epson 3200 Perfection Photo scanner and the Opticbook scanner are letter sized. The Epson also came with software to match up smaller scans into larger images (primarily aimed at creating panoramas, etc., for images) but I haven't used it.

One other plus, for me, anyway, of the Opticbook is that it opens from the side (the hinge is along one long side, not the 'top') which makes it MUCH easier to scan books, ime.

I hope I've answered all the questions. If I didn't, it's because I missed them, so please ask again.

Steven Lyle Jordan
10-15-2006, 06:06 PM
In past jobs, I've worked with Xerox Docutech and Kodak Lionheart digital printers (basically computer-controlled copier-printers), and smaller digital copiers that create a digital image from original content, and accept digital documents from a disk or from a network. Some of those devices interface with a computer, and can be designed to scan the original into the PC for OCR, reorganization, and saving the digital file for future printing needs.

Occasionally we scanned books for printing, and we had to take them apart to do so. Usually, the only catch was feeding it through the scanner... we had scanners that would automatically feed pages, even duplex on-the-fly, but they didn't always handle odd page sizes (a paperback-sized page would never feed well, too small).

Although I'm sure the equipment could be designed to feed odd-sized papers, I'm also sure that the printer companies were sensitive enough to the desire of publishers to avoid making it easy to copy their books--and their own desire not to be sued--that they essentially locked their scanners into handling 8.5x11-and-up sizes only. (This is the same reason they take steps to avoid your making counterfeit money on your color printer...)

The only thing I've ever found to make the process "easy" was to make it a 2-step process: First, use a standard copier to enlarge the smaller book pages into 8.5x11 (or A4, if you're on the other side of the pond) single-sided pages; Then, feed those letter-sized pages through an auto-feeding scanner for digital files and OCR. The benefits here are that the copier work will be faster and easier than hand-scanning individual pages, and the larger type size of the enlarged copies will OCR easier.

Nag
02-21-2007, 10:28 PM
Hi guys:

Searching for experiences with Plustex's OpticBook and came across this thread.

Couple of questions and suggestions.

1. What is the scanning speed of OpticBook, using 8-bit greyscale, at 300dpi, 400dpi and 600dpi?

2. Does the manual mention the duty-cycle in terms of scans per day, month or life of machine?

Some suggestions.

1. From my experience with xeroxing books, saves time and easy on the eyes and the cover (no need to close it for every scan) if one uses white paper to cover the area of glass not needed. Use scotch tape and make sure it does not touch the glass - leaves some residue.

2. From reading, seems a lossless format is best for storing images. Apparently with jpeg one loses quality with every save. tiff is widely used, png is also good.

3. For shuffling odd and even pages, one can use rotate and Quite Imposing Plus inside Adobe acrobat. Scan all the even pages and then the odd pages - makes it faster since the book need not be rotated.

Best
Nag

RWood
02-21-2007, 11:01 PM
Bob, I'll take "Making an eBook from a Binder of 8.5" x 11" for $100"
(Be sure to phrase your response ....) :D

Cheapo-cheapo production methods would be to use the scanning part of the multifunction printer such as the HP 6210 All in One that I picked up at CompUSA for just under $100 that includes bundled software for both the Mac and PC. Among that software is a fairly good OCR program.

The scanner part has an automatic feed (no duplex) that can be set to as small as a mass market ("it fits in your hip pocket") paperback. I have scanned, rearranged pages, and OCR on the software with very good results. Sure some ls were 1s; but, nobody's perfect all of the time.

Bob Russell
02-22-2007, 06:44 AM
Bob, I'll take "Making an eBook from a Binder of 8.5" x 11" for $100"
(Be sure to phrase your response ....) :D

Cheapo-cheapo production methods would be to use the scanning part of the multifunction printer such as the HP 6210 All in One that I picked up at CompUSA for just under $100 that includes bundled software for both the Mac and PC. Among that software is a fairly good OCR program.

The scanner part has an automatic feed (no duplex) that can be set to as small as a mass market ("it fits in your hip pocket") paperback. I have scanned, rearranged pages, and OCR on the software with very good results. Sure some ls were 1s; but, nobody's perfect all of the time.That's very interesting...I've been perusing those all-in-ones for such a purpose myself. I notice you can even get duplex scanning for about $400. But if you have auto sheet feeders without duplex scanning, isn't it tough to get the book pages in the right order? And don't you get a lot of misfeeds with inexpensive all-in-ones?

I'm hoping that those are not issues -- a decent $100 book scanning solution would be very impressive. Tell us more!

RWood
02-22-2007, 10:01 AM
The first book I converted was an old late 1950s paperback with yellowing pages and glue falling into dust for the binding. All bound edges were clean cut to remove traces of the glue and the rough edges. The pages were not smooth surfaced and quite rough as many paperbacks were in those days. There were only about 2 misfeeds for the whole book. As you noted I then had to rearrange the pages manually. This only took about 20 minutes.

Since then I have scanned other books (I wait until I need a rubber band around the book to hold it together) and the higher quality paper does scan and OCR better. Magizines (and any thin clay coated stock) will not feed and must be done on a page-by-page basis.

From examination in the stores it seems to me that there is little difference between the feeders other than the duplexing as you increase in price from one model from a company to another. I suspect that it is cheaper for the companies to offer a standard unit rather than a unique unit for each model.

Bob Russell
02-22-2007, 10:39 AM
That's very encouraging.

Now that I think about it, I suppose a clever person could write a DOS batch file or other kind of script to take the default order you get and rename the images in order. Then a simple review of the pages with windows slideshow in file explorer (or equivalent) would pretty quickly catch and fix any anomalies.

Even one misfeed per 100 pages is probably tolerable, although maybe a little bit of a headache. But we're talking a low-budget solution that's accessible to almost anyone.

One more question... what software were you using for scanning and OCR? Was it what comes with the printer, or did you use your own software also?

RWood
02-22-2007, 05:22 PM
As noted in my first post this is a cheapo-cheapo method. Therefore all of the software used was provided by HP on the install CD.

Although I have other (they say more advanced) OCR programs like TextBridge, the results do not change that much when using good originals. Higher resolutions and better OCR programs would be critical if the base language was for example Russian. Since English has a fairly simple character set it is easier than many to OCR. If exotic designer fonts are used then the better quality programs are a must. For common everyday typeset books, the supplied software is quite fine.

Bob Russell
02-22-2007, 05:45 PM
Okay, I was wrong... I have yet another question!

If you wanted to create a pdf from the images without doing OCR (e.g. a document with lots of formulas and diagrams), can you also do that with the included software?

I think you may have sold a lot of all-in-one HP printers with this information, by the way!

RWood
02-23-2007, 08:31 PM
I don't own stock in HP so if sales increase I will not enjoy any additional advantage.

As for output going directly to PDF I am not sure. I had already installed a full copy of Adobe Acrobat 6 Professional on the machine before I hooked up the printer so I am not sure if the capability is native to the HP provided software or available simply because I had already installed Acrobat.

The software uses TIF as the native graphic format for the saved scans so it should be easy for most othe programs to pick up these formats and create a PDF from them.

RWood
02-23-2007, 11:42 PM
I went back and reviewed the install CD that came with the HP, there were no PDF creation programs on it that I could find. All of the documentation on the disk was in HTML.

When I set up the scan from the All-In-One using the "Scan To" button, one of the options is "Adobe Acrobat" Since I had a full copy installed before I installed the HP software I feel that this is where it picked up this option from. When used it transfers the scans directly to PDF files and leaves the Acrobat application open on the desktop with the scanned file ready for me. Likewise, if I pick "Scan to Word" it opens Word and leaves an image in Word -- not OCR text, an image. You must go to an HP application and select OCR with a destination of Word to get text. Any word processor will work, I just happen to use Word.

BTW: The All-In-One also comes with a set of Mac drivers and application programs that are said to do the same as the PC drivers and applications.

Studio717
02-27-2007, 11:48 AM
Nag, one of the benefits of the Opticbook scanner is that #1 and #3 of your list aren't necessary. After the first Preview scan and boundary setup, the scanner continually scans only the page area you've set up. And the included software (PC only, much to my dismay) has auto odd or even page rotation.

Bob and RWood, the Mac has built in PDF generation under the Print dialog box, so the HP solution could generate PDFs with no additional cost. (I don't know if there's an upper page limit to the PDFs. I haven't run across one, but I use Acrobat for large projects.)

RWood
02-27-2007, 12:39 PM
On the PC side I have genreated 3,000+ page documents with no problem (just a lot of time) so I would doubt that there would be a problem on the Mac.

Nag
03-27-2007, 10:26 AM
Studio717:

I was suggesting alternatives, which ought to speed-up the scanning process. Not claiming that opticbook cannot do it.

For example, if the scanner-head doesn't illuminate beyond the page area, then there is no need to use white pages. OTOH, if it scans the entire area and then trims the margins, then using white paper is less stressful on the eye and also no need to close the cover of the scanner for every scan.

One can scan odd and even pages continuously and the opticbook will merge them. However, rotating the book for every scan takes time and effort. Scanning all odd pages first and then all even pages makes the scanning faster.

Try this for a book or two.

Best
Nag

Nag, one of the benefits of the Opticbook scanner is that #1 and #3 of your list aren't necessary. After the first Preview scan and boundary setup, the scanner continually scans only the page area you've set up. And the included software (PC only, much to my dismay) has auto odd or even page rotation.

jackbrown
05-27-2007, 08:36 PM
RE: Nag's question--

actually when I scan books, I don't use 8 bit grayscale, I do black and white: the files are smaller, and there's no need for anti-aliasing the text; OCR seems to work better without it anyway.