View Single Post
Old 10-01-2009, 02:27 AM   #21
Mr. Dalliard
Zealot
Mr. Dalliard began at the beginning.
 
Posts: 143
Karma: 35
Join Date: Jan 2009
Location: Osaka, Japan
Device: Kindle 3
I currently engaged in a 10,000+ page bilingual OCR project.
I'm about a fifth of the way in, and the process is becoming more streamlined as I progress.

I was using the company copier for a while, which produced a nice monochrome 600 dpi PDF. However, some of the volumes are so thick and heavy that, in the end, I decided to do the remainder by hand, rather than risk damaging the books, and my wrists.

I now use a makeshift frame, to hold the book open; a 1cm thick clear acrylic sheet, to flatten the page; two lamps, for illumination; and a 10Mp digital camera at a distance of around 50cm - to avoid barrel distortion - to take the shots.

Unlike the PDFs from the copier, a little extra post-processing of the images is required for painfree OCRing (gamma adjustment > monochrome) but I have got that too down to a fine art. Obviously the resulting images can't compare with the 600dpi of the copier, but, fortunately, the original text is quite large anyway so it still works well.

Next comes the proofreading of the output.....

Last edited by Mr. Dalliard; 10-01-2009 at 02:30 AM.
Mr. Dalliard is offline   Reply With Quote