View Single Post
Old 08-30-2010, 09:17 AM   #51
Lady Fitzgerald
Wizard
Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.
 
Lady Fitzgerald's Avatar
 
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
Quote:
Originally Posted by Iain View Post
Firstly, thanks for the comments I've read on on this forum and people who've answered my questions.


I've finally completed starting my digitising task! This whole thing has turned from a task into a fairly complex project, with a good deal of custom written software. And that's before I've digitised more than a few books!

I've blogged about this (horrid word and this is one of my first attempts at blogging) in some detail here (Iain's blog) but the short form goes like this.

I start off by cutting the spines off with a guillotine and counting the pages.

I've written a scanning program which talks to my Fujistu fi-6130. It captures the ISBN (bar code scanner or human entry) and finds the publication details (isbndb.com). I enter the subject and the number of pages and start the scan.

The program scans the first pages (the cover pages) in colour and the rest in monochrome. I do, of course, have to reload the hopper every minute or so, but that's quick and not too distracting. On completion, the tiff file (500MB - 2GB!) is queued for OCR and so on. If there are problems, then you can edit the tiff and delete pages or add new scanes.

The OCR processing side uses FineReader 10. I'm controlling FineReader through AutoHotKey so I don't have to interact with it. FineReader processes the document and saves it in word, html and text formats.

The word document is processed (again by a program of my own devising) and generates an ePub file which actually looks pretty good (though I say so myself).

Finally all the book details and the text are put in a database so that I can find books in a variety of ways.

That's the short form! The blog has a good deal more detail and I would welcome comments!

In particular, having spent a good deal of time writing code for this, I'm wondering if there is an opportunity to commercialise this.

Do you think people would be interested in a book digitisation service (I think I would have to charge about $2 a book and the book would be destroyed).

Do you think people would be interested in a more or less off the shelf system which could efficiently turn their mouldering paperbacks into prisine eBooks?

Let me know here or privately at iain AT idcl DOT co DOT uk
You do not edit after OCR?

On average, how much time did you spend on each book.
Lady Fitzgerald is offline   Reply With Quote