MobileRead Forums - View Single Post

ldolse · 01-28-2011, 10:42 AM

There is no OCR phase in Calibre, but some of the source documents people use are rtf/txt/html files generated directly from OCR conversion software. Depending on the quality of the OCR software there can be a variety of issues.

I've actually been scanning some favorite paperbooks that aren't available electronically lately, I think I'm going to add a special Heuristics function just for cleaning up ABBYY generated html - it's not fun going through it by hand, that's for sure.

01-28-2011, 10:42 AM	#5
ldolse Wizard Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	There is no OCR phase in Calibre, but some of the source documents people use are rtf/txt/html files generated directly from OCR conversion software. Depending on the quality of the OCR software there can be a variety of issues. I've actually been scanning some favorite paperbooks that aren't available electronically lately, I think I'm going to add a special Heuristics function just for cleaning up ABBYY generated html - it's not fun going through it by hand, that's for sure.