View Single Post
Old 04-15-2009, 01:55 AM   #3
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,672
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
The only free OCR software I know of is tesseract and gocr. Tesseract is an open source OCR by google. They used a more optimized OCR for their books but I tend to see the same errors on their scan and my scans.

Tesseract only OCR uncompressed TIFF but there are some Free GUIs like Softi FreeOCR that support more image formats.


I do have a PDF->Text solution but it's not for the faint of heart.
It requires cygwin(for perl, pdf2ppm, ppm2tiff, convert(ImageMagik))

The perl script looks for all the PDF in a directory then extracts each page of the PDF into a text file. It's great for batch jobs

=X=
=X= is offline   Reply With Quote