View Single Post
Old 08-31-2010, 09:12 AM   #54
Lady Fitzgerald
Wizard
Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.
 
Lady Fitzgerald's Avatar
 
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
It is interesting for me since I'm in the process of digitizing my book collection.

No matter how good an OCR program may be, it will still take a fair amount of time to run. I have the version of ABBY Finereader that came with my Fujitsu ScanSnap s1500. I've only used it to give me searchable PDFs of tech magazines I have (obviously, no editing is required since there is no visible text generated other than the image of each page taken by the scanner). It takes around 30 minutes to an hour (I don't remember exactly) for the OCR to run on a 100 page magazine in addition to cutting and scanning the magazine (fortunately, I do not have very many magazines). Without OCR, I can scan, save, and catalogue 3-4 books per hour if I'm paying attention (usually I'm not; having ADD doesn't help). Since I have over 1500 books to do and want to finish before the end of the year, OCR just isn't an option, even without editing. I could always run my PDFs through OCR later but I don't plan on it. I'm able to easily read all but the largest books with the smaller print on a Jetbook Lite. Even the large page, small print books can be read without eyestrain on the JBL but it's a bit more awkward to scroll and good lighting becomes more critical. Using the JBL instead of a larger reader is a tradeoff to gain portability (it fits in my purse).

You said that your OCR process has few errors. How well does it deal with page headers and footers and page numbers? How about drop caps at the beginning of a sentence? Some of those use pretty intricate, decorative fonts. How about when fonts change within a book, such as bold text or italics? Is your OCR process able to replicate or accurately read those? Often, certain passages in a book have increased margins to denote a quoted passage, such as a paragraph from a letter. How does that get handled? Many fonts used in books have charaters that are similar or identical to others, such as the upper and lower case j being identical or the letters l and I being similar to each other and the number 1 (sometimes even identical). How well is that handled? How do images get handled? You said you can tolerate some mistakes. How many is some? Unfortunately, I would find any mistakes very distracting and annoying. For me editing would take about as long as would take to read the book. I can't spare even 30-60 minutes just run the OCR because of the large number of books I have and limited time available, even considering I'm retired now.

I wish getting an occasional cover wrong way around was my only operator error. I have been known to insert a set of pages in the ADF the wrong way. If the pages were merely upside down, it would be easy to correct in Adobe Acrobat 9 but if I get the order reversed, it's much faster to rescan those pages, then replace the incorrect pages with the newly scanned ones, again using Acrobat.

How many cuts have you made with your guillotine? Mine broke after only 250 books. Although I'm currently doing battle with Amazon over it since the guillotine they sold me apparently is an inferior knock off, I would consider spending the extra money to get a more reliable one.

My guillotine has a different clamping mechanism than yours but the fence is the same as yours. I also had problems trying to figure out where to set it because of no easy way to see where the cut will occur. I found the easiest way to align the fence (which also kept my fingers away from that vicious cutter blade) was to leave the blade dropped after the previous cut (I also store it that way), slip the book into place with the spine against the blade, lower the clamp until it lightly touches the book (but still allows free movement), push the fence tightly against the book until the pages are flush with the fence face, then tighten the clamp on the fence. I then raise the blade and lock it, push the book away from the fence slightly, slip a shim or two (thin pieces of cardboard; the number and thickness based on previous experience) between the fence and the book, then pull the book back against the fence. I then tighten the clamp a bit more, use a thin tool to gently bump the spine snug against the spine (the idea of the tool is to avoid getting my fingers near the blade; I almost lost the tip of a thumb to it when I first got it), then finish tightening the clamp and make the cut. I found this procedure goes quickly, is safe, and is more accurate than trying to eyeball where cut is going to take place.

If a book has a very curved spine and the gutter margin is too small to comfortably accomodate the curvature when cutting the spine off, on hard backs (I strip the cover off hardbacks before cutting to avoid excessively stressing the guillotine), I try "breaking" the spine by folding it sharply back in several places to try and make it easier to flatten the spine. If that doesn't work (and on paper backs), I cut the book apart into several smaller pieces, which minimizes the curvature of each section of book, then cut each piece one at a time.

Last edited by Lady Fitzgerald; 08-31-2010 at 10:56 AM.
Lady Fitzgerald is offline   Reply With Quote