07-06-2021, 03:23 PM | #16 |
Grand Sorcerer
Posts: 27,903
Karma: 198500000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
07-06-2021, 08:21 PM | #17 |
Evangelist
Posts: 495
Karma: 2267928
Join Date: Nov 2015
Device: none
|
|
Advert | |
|
07-06-2021, 08:32 PM | #18 | |
Wizard
Posts: 2,835
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
|
Quote:
|
|
07-07-2021, 07:08 AM | #19 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Hi
Tesseract, gimageReader, LO. All images are in the attached zip file. The sources are the two attached images Pasteur 01.jpg and Pasteur 02.jpg. It's a scientific (admittedly old) text, with italics, superscript, some special characters, nothing specially easy. I took the following screenshots - écran gimagereader is what you get. You can correct some red mistakes or follow on. I did not correct anything. - écran gimagereader2 is what you get when you click to suppress line ends. - Pasteur.txt is the output from gimageReader. - Pasteur.odt is what you get on LO when you import the file Pasteur.txt in your working model. - checking.png is how I proceed for the checking phase. I put the image on the left, the working model on the right. I hope these images and screenshots will provide you with an honest understanding of what Tesseract 4.1.1. can do now. The text of most of the fiction books is easier than this example. |
07-07-2021, 07:42 AM | #20 |
Grand Sorcerer
Posts: 27,903
Karma: 198500000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
Advert | |
|
07-07-2021, 08:40 AM | #21 |
Diligent dilettante
Posts: 3,440
Karma: 49052774
Join Date: Sep 2019
Location: in my mind
Device: Kobo Sage; Kobo Libra H2O
|
I can empathize with the OP when it comes to the availability of high-quality alternatives to specialized commercial software available in Windows. I recently upgraded the RAM on my PC and decided to test out a few Linux distros in VMs. It's been nearly 10 years since I was active in Linux and an hour or two was all it took to remind me why. 10 years ago I was beginning to need high-quality speech recognition software more and more often, and there was nothing in the Linux world that came within a parsec of Dragon NaturallySpeaking. Ten years on, Dragon has got better and better while my need for it has grown greater and greater, and there still isn't any viable Linux alternative. So I can definitely understand how the OP feels when one would like to try Linux but it simply does not have the software one needs. FWIW this entire post is courtesy of Dragon.
|
07-07-2021, 09:21 AM | #22 |
Wizard
Posts: 3,454
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
|
Disclaimer: I use Tesseract myself [on a Mint Linux computer] for an occasional OCR of a book that I have in pdf and want to read on my e-ink reader.Yes, it does. You need to tell it what the language is.
It recognizes the text, but does not format it italics (or bold). This is the biggest shortcoming, IMHO. No. I use pdfscissors to pre-format [cut] the pdf for OCR. Then I use Regular Expressions on a finished text to do some cleanup, including getting rid of page breaks, headers or footers (if the pdfscissors couldn't be used successfully to remove them) Haven't tried that yet. I wrote (stole most of the code from stack overflow and similar sites) a bash script that uses imagemagick command to create a bitmap from each pdf page and than runs the bitmap through the tesseract. The image is saved to a ramdisk, so I do not cause unnecessary wear to my SSD. Not as nice, neat or interactive solution as Fine Reader and similar software such as Recognita or Readiris (I used all of them on Windows at work), but good enough for my needs at home. I would not be willing to fork over money for Fine Reader for my very limited use, and this way I do not need to use pirated software. Last edited by kacir; 07-07-2021 at 09:26 AM. |
07-07-2021, 09:52 AM | #23 |
Evangelist
Posts: 495
Karma: 2267928
Join Date: Nov 2015
Device: none
|
Commercial software is developed for (and by) people involved in the processes the software is intended to assist with. Free software is made by people who like cr*p like vi or TeX, and who do not understand how the proper software should work, and why.
|
07-07-2021, 09:53 AM | #24 | |
Wizard
Posts: 2,835
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
|
Quote:
|
|
07-07-2021, 09:55 AM | #25 |
Evangelist
Posts: 495
Karma: 2267928
Join Date: Nov 2015
Device: none
|
|
07-07-2021, 09:57 AM | #26 | |
Wizard
Posts: 2,835
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
|
Quote:
|
|
07-07-2021, 09:58 AM | #27 |
Wizard
Posts: 2,835
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
|
Hey! Leave vi out of it! Blasphemer. vi is actually an example of excellent software, it just has a learning curve.
|
07-07-2021, 10:07 AM | #28 |
Grand Sorcerer
Posts: 7,450
Karma: 67000001
Join Date: Feb 2009
Device: Kobo Glo HD
|
I use quite a bit of free software, and I would say it works as proper software should. I'm a user of the software, not a developer.
|
07-07-2021, 10:07 AM | #29 |
Diligent dilettante
Posts: 3,440
Karma: 49052774
Join Date: Sep 2019
Location: in my mind
Device: Kobo Sage; Kobo Libra H2O
|
|
07-07-2021, 10:08 AM | #30 | |
Grand Sorcerer
Posts: 27,903
Karma: 198500000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
A brief listing of people who make free (and open-source) "crap" that runs on Linux: Microsoft Mozilla Apache Adobe Npm Oracle LibreOffice (The Document Foundation) GIMP (equally as powerful and as impossible to master as Photoshop) Python You want to say none of the products that the above produce for Linux works for you personally... fine. You'll get no argument from me. But if you want to continue to insist that free software == crap, then you're quite obviously full of it yourself. Crap software is crap software--whether it's free or paid for. The inverse is also true. Last edited by DiapDealer; 07-07-2021 at 10:12 AM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Report on Abbyy FineReader OCR Software w/ Canon Lide 60 | 1611mac | Workshop | 6 | 01-27-2012 06:05 PM |
Accessories Hand-held Scanner with OCR Software | Hopi | enTourage Archive | 7 | 01-26-2011 06:40 PM |
OCR Software Help | kpfeifle | Workshop | 5 | 03-01-2010 02:27 PM |
Recommendation for basic scanning software (non OCR) | yunti | Workshop | 1 | 11-27-2009 07:08 AM |
OCR-Software für altdeutsche Schrift | mtravellerh | Software | 9 | 02-19-2009 02:29 PM |