View Single Post
Old 12-04-2021, 05:55 AM   #7
RbnJrg
Wizard
RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.
 
Posts: 1,834
Karma: 8700631
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
Quote:
Originally Posted by j.p.s View Post
It looks like gImageReader has a history of slowing down after Tesseact major version changes. Seems to be related to CPU vectorizing support and Tesseact compile options.

https://github.com/manisandro/gImageReader/issues/285

There is a pending pull request that supposedly fixes the above, but it looks like it won't be merged.

https://github.com/manisandro/gImageReader/pull/286

The below links to how to build Tesseract by the gImageReader author, but the links are dead.

https://github.com/manisandro/gImageReader/issues/357

This is all very strange since people having the problem say Tesseact from the command line is not slow and the gImageReader author says it's not a gImageReader problem. This is all T V3 _> T V4.
Many thanks for your info. I was doing some experiments that confirm what you wrote:

1. I downloaded and installed this GUI:

https://github.com/Parathantl/tesseract_gui/releases

(It installs Tesseract 4 but is easy to replace V4 with V5).

2. That GUI is to OCR pdf files.

3. I OCRed a pdf with 25 pages and I noted the time to finish the task.

4. I repeated the job but in console mode. Results were practically the same.

5. After my tests, I can say that ABBy is -at least- twice faster than Tesseract while the accuracy is almost the same.

Finally, I think I discover the cause of the difference of speed; Tesseract is using ONLY ONE CPU. I don't know how was compiled the .exe (for 64bits) but is not multithreading or the user doesn't have the option to enable it (maybe under Linux things are different). A real pity because is a nice program with a very good OCR precision and free.
RbnJrg is offline   Reply With Quote