Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 12-02-2021, 08:06 AM   #1
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
OCR after scan update

https://devclass.com/2021/12/01/tesseract-ocr-5/
Quoth is offline   Reply With Quote
Old 12-02-2021, 12:27 PM   #2
RbnJrg
Wizard
RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.
 
Posts: 1,548
Karma: 6613969
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
Many thanks for the info! By chance, do you know about a good Windows GUI for this new release of Tesseract?
RbnJrg is offline   Reply With Quote
Advert
Old 12-02-2021, 03:35 PM   #3
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Quote:
Originally Posted by RbnJrg View Post
…do you know about a good Windows GUI for this new release of Tesseract?
Sorry, I don't remember. I ditched Windows 100% for Linux Mint + Mate Desktop in Jan 2017 after nearly 25 years, though I'd been using Linux on servers and dual boot since 1998 (Red Hat, Suse, CentOS, Debian, Ubuntu, DSL etc).
Quoth is offline   Reply With Quote
Old 12-02-2021, 06:31 PM   #4
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,285
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by RbnJrg View Post
Many thanks for the info! By chance, do you know about a good Windows GUI for this new release of Tesseract?
https://tesseract-ocr.github.io/tess...-3rdParty.html

This was the first hit for a web search on "tesseract ocr windows gui". I have no idea whether any of them are any good. The bove list is not limited to microsoft platforms.
j.p.s is offline   Reply With Quote
Old 12-03-2021, 06:34 AM   #5
RbnJrg
Wizard
RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.
 
Posts: 1,548
Karma: 6613969
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
Quote:
Originally Posted by j.p.s View Post
https://tesseract-ocr.github.io/tess...-3rdParty.html

This was the first hit for a web search on "tesseract ocr windows gui". I have no idea whether any of them are any good. The bove list is not limited to microsoft platforms.
Many thanks. Yes, I was aware about that list, that mainly is regarding Tesseact 4.0 GUI. For example, gImageReader works for Tesseract 5.0 but it takes a lot of time to OCR a few pages (when the main advantage for Tesseract 5 over Tesseract 4 is -or should be- the speed).
RbnJrg is offline   Reply With Quote
Advert
Old 12-03-2021, 12:20 PM   #6
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,285
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by RbnJrg View Post
Many thanks. Yes, I was aware about that list, that mainly is regarding Tesseact 4.0 GUI. For example, gImageReader works for Tesseract 5.0 but it takes a lot of time to OCR a few pages (when the main advantage for Tesseract 5 over Tesseract 4 is -or should be- the speed).
It looks like gImageReader has a history of slowing down after Tesseact major version changes. Seems to be related to CPU vectorizing support and Tesseact compile options.

https://github.com/manisandro/gImageReader/issues/285

There is a pending pull request that supposedly fixes the above, but it looks like it won't be merged.

https://github.com/manisandro/gImageReader/pull/286

The below links to how to build Tesseract by the gImageReader author, but the links are dead.

https://github.com/manisandro/gImageReader/issues/357

This is all very strange since people having the problem say Tesseact from the command line is not slow and the gImageReader author says it's not a gImageReader problem. This is all T V3 _> T V4.
j.p.s is offline   Reply With Quote
Old 12-04-2021, 05:55 AM   #7
RbnJrg
Wizard
RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.
 
Posts: 1,548
Karma: 6613969
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
Quote:
Originally Posted by j.p.s View Post
It looks like gImageReader has a history of slowing down after Tesseact major version changes. Seems to be related to CPU vectorizing support and Tesseact compile options.

https://github.com/manisandro/gImageReader/issues/285

There is a pending pull request that supposedly fixes the above, but it looks like it won't be merged.

https://github.com/manisandro/gImageReader/pull/286

The below links to how to build Tesseract by the gImageReader author, but the links are dead.

https://github.com/manisandro/gImageReader/issues/357

This is all very strange since people having the problem say Tesseact from the command line is not slow and the gImageReader author says it's not a gImageReader problem. This is all T V3 _> T V4.
Many thanks for your info. I was doing some experiments that confirm what you wrote:

1. I downloaded and installed this GUI:

https://github.com/Parathantl/tesseract_gui/releases

(It installs Tesseract 4 but is easy to replace V4 with V5).

2. That GUI is to OCR pdf files.

3. I OCRed a pdf with 25 pages and I noted the time to finish the task.

4. I repeated the job but in console mode. Results were practically the same.

5. After my tests, I can say that ABBy is -at least- twice faster than Tesseract while the accuracy is almost the same.

Finally, I think I discover the cause of the difference of speed; Tesseract is using ONLY ONE CPU. I don't know how was compiled the .exe (for 64bits) but is not multithreading or the user doesn't have the option to enable it (maybe under Linux things are different). A real pity because is a nice program with a very good OCR precision and free.
RbnJrg is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Scan for duplicates lbutlr Library Management 18 04-14-2019 08:01 AM
Unable to Scan hroberts89436 Calibre Companion 4 12-11-2016 02:04 PM
Is barebones commercial scan/ocr to PDF file adequately converted by Send-To-Kindle ? scanewbie Workshop 4 07-20-2015 05:54 PM
How to convert an OCR file to a Non-OCR one res9282 PDF 1 08-05-2011 05:58 AM
scan to eBook Red Alert Sony Reader 9 07-29-2007 03:21 AM


All times are GMT -4. The time now is 05:55 AM.


MobileRead.com is a privately owned, operated and funded community.