Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > General Discussions

Notices

Reply
 
Thread Tools Search this Thread
Old 07-06-2021, 03:23 PM   #16
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Pajamaman View Post
This sums up with I haven't moved to Linux yet. Lack of software choice.
Odd. Most people would say quite the opposite about Linux: too much choice in software.
DiapDealer is offline   Reply With Quote
Old 07-06-2021, 08:21 PM   #17
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by DiapDealer View Post
too much choice in software.
Too much choice in toy, experiment and PoC software.

Or, like in that case, reject of commercial software. They've added a new engine, but lost the opportunity to detect typefaces.
Sarmat89 is offline   Reply With Quote
Advert
Old 07-06-2021, 08:32 PM   #18
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Pajamaman's Avatar
 
Posts: 2,827
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
Quote:
Originally Posted by rcentros View Post
I don't like Ubuntu's desktop either. (I wasn't a fan of Unity and I'm not a fan of the Gnome 3s GUI. I could probably get used to it, but I don't want to.) Linux Mint (while based on Ubuntu) uses a "traditional" Windows-like desktop (and this is consistent with all three "flavors," Cinnamon, Mate and Xfce). I don't know what you mean by "no blank image-free desktop background" in Ubuntu, but in Linux Mint you can use whatever image (including a blank image) you want.
Yeah I couldn't get words out. I mean I can't just set the desktop to a color. I must use an image.
Pajamaman is offline   Reply With Quote
Old 07-07-2021, 07:08 AM   #19
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Hi

Tesseract, gimageReader, LO.

All images are in the attached zip file.

The sources are the two attached images Pasteur 01.jpg and Pasteur 02.jpg. It's a scientific (admittedly old) text, with italics, superscript, some special characters, nothing specially easy.

I took the following screenshots
- écran gimagereader is what you get. You can correct some red mistakes or follow on. I did not correct anything.
- écran gimagereader2 is what you get when you click to suppress line ends.

- Pasteur.txt is the output from gimageReader.

- Pasteur.odt is what you get on LO when you import the file Pasteur.txt in your working model.

- checking.png is how I proceed for the checking phase. I put the image on the left, the working model on the right.

I hope these images and screenshots will provide you with an honest understanding of what Tesseract 4.1.1. can do now. The text of most of the fiction books is easier than this example.
Attached Files
File Type: zip tesseract.zip (3.18 MB, 189 views)
roger64 is offline   Reply With Quote
Old 07-07-2021, 07:42 AM   #20
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Sarmat89 View Post
Too much choice in toy, experiment and PoC software.
If it don't cost a bajillion dollars, it must suck, right? You enjoy your delusion.
DiapDealer is offline   Reply With Quote
Advert
Old 07-07-2021, 08:40 AM   #21
Uncle Robin
Diligent dilettante
Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.
 
Uncle Robin's Avatar
 
Posts: 3,417
Karma: 48736498
Join Date: Sep 2019
Location: in my mind
Device: Kobo Sage; Kobo Libra H2O
I can empathize with the OP when it comes to the availability of high-quality alternatives to specialized commercial software available in Windows. I recently upgraded the RAM on my PC and decided to test out a few Linux distros in VMs. It's been nearly 10 years since I was active in Linux and an hour or two was all it took to remind me why. 10 years ago I was beginning to need high-quality speech recognition software more and more often, and there was nothing in the Linux world that came within a parsec of Dragon NaturallySpeaking. Ten years on, Dragon has got better and better while my need for it has grown greater and greater, and there still isn't any viable Linux alternative. So I can definitely understand how the OP feels when one would like to try Linux but it simply does not have the software one needs. FWIW this entire post is courtesy of Dragon.
Uncle Robin is offline   Reply With Quote
Old 07-07-2021, 09:21 AM   #22
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,450
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Disclaimer: I use Tesseract myself [on a Mint Linux computer] for an occasional OCR of a book that I have in pdf and want to read on my e-ink reader.
Quote:
Originally Posted by Sarmat89 View Post
It does diacritics?
Yes, it does. You need to tell it what the language is.
Quote:
Originally Posted by Sarmat89 View Post
It does italics?
It recognizes the text, but does not format it italics (or bold). This is the biggest shortcoming, IMHO.
Quote:
Originally Posted by Sarmat89 View Post
It strips headers/footers?
No. I use pdfscissors to pre-format [cut] the pdf for OCR.
Then I use Regular Expressions on a finished text to do some cleanup, including getting rid of page breaks, headers or footers (if the pdfscissors couldn't be used successfully to remove them)
Quote:
Originally Posted by Sarmat89 View Post
It recognizes custom words?
Haven't tried that yet.

I wrote (stole most of the code from stack overflow and similar sites) a bash script that uses imagemagick command to create a bitmap from each pdf page and than runs the bitmap through the tesseract. The image is saved to a ramdisk, so I do not cause unnecessary wear to my SSD.

Not as nice, neat or interactive solution as Fine Reader and similar software such as Recognita or Readiris (I used all of them on Windows at work), but good enough for my needs at home. I would not be willing to fork over money for Fine Reader for my very limited use, and this way I do not need to use pirated software.

Last edited by kacir; 07-07-2021 at 09:26 AM.
kacir is offline   Reply With Quote
Old 07-07-2021, 09:52 AM   #23
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by DiapDealer View Post
If it don't cost a bajillion dollars, it must suck, right?
Commercial software is developed for (and by) people involved in the processes the software is intended to assist with. Free software is made by people who like cr*p like vi or TeX, and who do not understand how the proper software should work, and why.
Sarmat89 is offline   Reply With Quote
Old 07-07-2021, 09:53 AM   #24
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Pajamaman's Avatar
 
Posts: 2,827
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
Quote:
Originally Posted by Uncle Robin View Post
I can empathize with the OP when it comes to the availability of high-quality alternatives to specialized commercial software available in Windows. I recently upgraded the RAM on my PC and decided to test out a few Linux distros in VMs. It's been nearly 10 years since I was active in Linux and an hour or two was all it took to remind me why. 10 years ago I was beginning to need high-quality speech recognition software more and more often, and there was nothing in the Linux world that came within a parsec of Dragon NaturallySpeaking. Ten years on, Dragon has got better and better while my need for it has grown greater and greater, and there still isn't any viable Linux alternative. So I can definitely understand how the OP feels when one would like to try Linux but it simply does not have the software one needs. FWIW this entire post is courtesy of Dragon.
Yeah, exactly. Again, it's not a criticism per se of Linux. I am glad Linux exists as an alternative to the commercial giants, and I do not Linux software to keep up with Windows and Mac software; the money is simply not there. But I am glad it is there. One day we may truly need it. And there is always wine.
Pajamaman is offline   Reply With Quote
Old 07-07-2021, 09:55 AM   #25
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by roger64 View Post
an honest understanding of what Tesseract 4.1.1. can do now
I see "$", ". 1l ", and "/errmentation". That's at least 3 gross errors on a single page.
Sarmat89 is offline   Reply With Quote
Old 07-07-2021, 09:57 AM   #26
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Pajamaman's Avatar
 
Posts: 2,827
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
Quote:
Originally Posted by kacir View Post
does not format it italics (or bold)...No. I use pdfscissors to pre-format [cut] the pdf for OCR.
Then I use Regular Expressions on a finished text to do some cleanup, including getting rid of page breaks, headers or footers (if the pdfscissors couldn't be used successfully to remove them)
Haven't tried that yet.

I wrote (stole most of the code from stack overflow and similar sites) a bash script that uses imagemagick command to create a bitmap from each pdf page and than runs the bitmap through the tesseract. The image is saved to a ramdisk, so I do not cause unnecessary wear to my SSD.

Not as nice, neat or interactive solution as Fine Reader and similar software such as Recognita or Readiris
Again exactly. If I had to do all that, I just wouldn't OCR. It's too much work for my personal non-professional needs. It would just take me too long to make the tools needed to get the job done, so I wouldn't bother doing the job. I am a tool-user, not a tool-maker.
Pajamaman is offline   Reply With Quote
Old 07-07-2021, 09:58 AM   #27
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Pajamaman's Avatar
 
Posts: 2,827
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
Quote:
Originally Posted by Sarmat89 View Post
Commercial software is developed for (and by) people involved in the processes the software is intended to assist with. Free software is made by people who like cr*p like vi or TeX, and who do not understand how the proper software should work, and why.
Hey! Leave vi out of it! Blasphemer. vi is actually an example of excellent software, it just has a learning curve.
Pajamaman is offline   Reply With Quote
Old 07-07-2021, 10:07 AM   #28
John F
Grand Sorcerer
John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.John F ought to be getting tired of karma fortunes by now.
 
Posts: 7,172
Karma: 63764653
Join Date: Feb 2009
Device: Kobo Glo HD
Quote:
Originally Posted by Sarmat89 View Post
Commercial software is developed for (and by) people involved in the processes the software is intended to assist with. Free software is made by people who like cr*p like vi or TeX, and who do not understand how the proper software should work, and why.
I use quite a bit of free software, and I would say it works as proper software should. I'm a user of the software, not a developer.
John F is offline   Reply With Quote
Old 07-07-2021, 10:07 AM   #29
Uncle Robin
Diligent dilettante
Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.Uncle Robin ought to be getting tired of karma fortunes by now.
 
Uncle Robin's Avatar
 
Posts: 3,417
Karma: 48736498
Join Date: Sep 2019
Location: in my mind
Device: Kobo Sage; Kobo Libra H2O
Quote:
Originally Posted by Pajamaman View Post
Hey! Leave vi out of it! Blasphemer. vi is actually an example of excellent software, it just has a learning curve.

Them's Fighting words! Says this longtime GNU Emacs fan.
Uncle Robin is offline   Reply With Quote
Old 07-07-2021, 10:08 AM   #30
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Sarmat89 View Post
Free software is made by people who like cr*p like vi or TeX, and who do not understand how the proper software should work, and why.
Others are at least respectful in their criticisms. What is "crap"--in its entirety--is your above statement. But I expected nothing less.

A brief listing of people who make free (and open-source) "crap" that runs on Linux:

Microsoft
Google
Mozilla
Apache
Adobe
Npm
Oracle
LibreOffice (The Document Foundation)
GIMP (equally as powerful and as impossible to master as Photoshop)
Python

You want to say none of the products that the above produce for Linux works for you personally... fine. You'll get no argument from me. But if you want to continue to insist that free software == crap, then you're quite obviously full of it yourself.

Crap software is crap software--whether it's free or paid for. The inverse is also true.

Last edited by DiapDealer; 07-07-2021 at 10:12 AM.
DiapDealer is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Report on Abbyy FineReader OCR Software w/ Canon Lide 60 1611mac Workshop 6 01-27-2012 06:05 PM
Accessories Hand-held Scanner with OCR Software Hopi enTourage Archive 7 01-26-2011 06:40 PM
OCR Software Help kpfeifle Workshop 5 03-01-2010 02:27 PM
Recommendation for basic scanning software (non OCR) yunti Workshop 1 11-27-2009 07:08 AM
OCR-Software für altdeutsche Schrift mtravellerh Software 9 02-19-2009 02:29 PM


All times are GMT -4. The time now is 10:31 AM.


MobileRead.com is a privately owned, operated and funded community.