MobileRead Forums - View Single Post

RbnJrg · 12-04-2021, 05:55 AM

Quote:

Originally Posted by j.p.s

It looks like gImageReader has a history of slowing down after Tesseact major version changes. Seems to be related to CPU vectorizing support and Tesseact compile options.

https://github.com/manisandro/gImageReader/issues/285

There is a pending pull request that supposedly fixes the above, but it looks like it won't be merged.

https://github.com/manisandro/gImageReader/pull/286

The below links to how to build Tesseract by the gImageReader author, but the links are dead.

https://github.com/manisandro/gImageReader/issues/357

This is all very strange since people having the problem say Tesseact from the command line is not slow and the gImageReader author says it's not a gImageReader problem. This is all T V3 _> T V4.

Many thanks for your info. I was doing some experiments that confirm what you wrote:

1. I downloaded and installed this GUI:

https://github.com/Parathantl/tesseract_gui/releases

(It installs Tesseract 4 but is easy to replace V4 with V5).

2. That GUI is to OCR pdf files.

3. I OCRed a pdf with 25 pages and I noted the time to finish the task.

4. I repeated the job but in console mode. Results were practically the same.

5. After my tests, I can say that ABBy is -at least- twice faster than Tesseract while the accuracy is almost the same.

Finally, I think I discover the cause of the difference of speed; Tesseract is using ONLY ONE CPU. I don't know how was compiled the .exe (for 64bits) but is not multithreading or the user doesn't have the option to enable it (maybe under Linux things are different). A real pity because is a nice program with a very good OCR precision and free.