View Single Post
Old 07-06-2021, 03:53 AM   #7
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by Sarmat89 View Post
It does diacritics?
It does italics?
It strips headers/footers?
It recognizes custom words?
I have been using Tesseract to OCR about a hundred books and still do. OCR quality (with a quality scan) is at such a good level that most of the pages have zero mistake.

In French, it recognizes "accents circonflexes" and others accents like ü and all French accents. I did not try it for other languages than English. I know that you can use it for many languages if you install the corresponding data set.

It does not do italics (it did, and maybe will do it again)
It does not strip headers and footers .

My experience is that these two points above are just a minor drawback for most fiction books. During the checking phase, after the OCR process, you need anyway a kind of "breathing time" to go to the next page.

Tesseract is not perfect. It is perfectly usable now for most fiction books.

Last edited by roger64; 07-06-2021 at 04:03 AM. Reason: set
roger64 is offline   Reply With Quote