| 
			
			 | 
		#16 | 
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,891 
				Karma: 207182180 
				Join Date: Jan 2010 
				
				
				
				Device: Nexus 7, Kindle Fire HD 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#17 | 
| 
			
			
			
			 Fanatic 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 531 
				Karma: 2268308 
				Join Date: Nov 2015 
				
				
				
				Device: none 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#18 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,874 
				Karma: 10700629 
				Join Date: May 2016 
				Location: Canada 
				
				
				Device: Onyx Nova 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#19 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,625 
				Karma: 3120635 
				Join Date: Jan 2009 
				
				
				
				Device: Kindle PW3 (wifi) 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Hi 
		
	
		
		
			Tesseract, gimageReader, LO. All images are in the attached zip file. The sources are the two attached images Pasteur 01.jpg and Pasteur 02.jpg. It's a scientific (admittedly old) text, with italics, superscript, some special characters, nothing specially easy. I took the following screenshots - écran gimagereader is what you get. You can correct some red mistakes or follow on. I did not correct anything. - écran gimagereader2 is what you get when you click to suppress line ends. - Pasteur.txt is the output from gimageReader. - Pasteur.odt is what you get on LO when you import the file Pasteur.txt in your working model. - checking.png is how I proceed for the checking phase. I put the image on the left, the working model on the right. I hope these images and screenshots will provide you with an honest understanding of what Tesseract 4.1.1. can do now. The text of most of the fiction books is easier than this example.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#20 | 
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,891 
				Karma: 207182180 
				Join Date: Jan 2010 
				
				
				
				Device: Nexus 7, Kindle Fire HD 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#21 | 
| 
			
			
			
			 Diligent dilettante 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,662 
				Karma: 52758936 
				Join Date: Sep 2019 
				Location: in my mind 
				
				
				Device: Kobo Sage; Kobo Libra Colour 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I can empathize with the OP when it comes to the availability of high-quality alternatives to specialized commercial software available in Windows. I recently upgraded the RAM on my PC and decided to test out a few Linux distros in VMs. It's been nearly 10 years since I was active in Linux and an hour or two was all it took to remind me why. 10 years ago I was beginning to need high-quality speech recognition software more and more often, and there was nothing in the Linux world that came within a parsec of Dragon NaturallySpeaking. Ten years on, Dragon has got better and better while my need for it has grown greater and greater, and there still isn't any viable Linux alternative. So I can definitely understand how the OP feels when one would like to try Linux but it simply does not have the software one needs. FWIW this entire post is courtesy of Dragon.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#22 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,465 
				Karma: 10684861 
				Join Date: May 2006 
				
				
				
				Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Disclaimer: I use Tesseract myself [on a Mint Linux computer] for an occasional OCR of a book that I have in pdf and want to read on my e-ink reader.Yes, it does. You need to tell it what the language is. 
		
	
		
		
		
		
		
		
		
		
		
		
		
			It recognizes the text, but does not format it italics (or bold). This is the biggest shortcoming, IMHO. No. I use pdfscissors to pre-format [cut] the pdf for OCR. Then I use Regular Expressions on a finished text to do some cleanup, including getting rid of page breaks, headers or footers (if the pdfscissors couldn't be used successfully to remove them) Haven't tried that yet. I wrote (stole most of the code from stack overflow and similar sites) a bash script that uses imagemagick command to create a bitmap from each pdf page and than runs the bitmap through the tesseract. The image is saved to a ramdisk, so I do not cause unnecessary wear to my SSD. Not as nice, neat or interactive solution as Fine Reader and similar software such as Recognita or Readiris (I used all of them on Windows at work), but good enough for my needs at home. I would not be willing to fork over money for Fine Reader for my very limited use, and this way I do not need to use pirated software. Last edited by kacir; 07-07-2021 at 10:26 AM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#23 | 
| 
			
			
			
			 Fanatic 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 531 
				Karma: 2268308 
				Join Date: Nov 2015 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Commercial software is developed for (and by) people involved in the processes the software is intended to assist with. Free software is made by people who like cr*p like vi or TeX, and who do not understand how the proper software should work, and why.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#24 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,874 
				Karma: 10700629 
				Join Date: May 2016 
				Location: Canada 
				
				
				Device: Onyx Nova 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#25 | 
| 
			
			
			
			 Fanatic 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 531 
				Karma: 2268308 
				Join Date: Nov 2015 
				
				
				
				Device: none 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#26 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,874 
				Karma: 10700629 
				Join Date: May 2016 
				Location: Canada 
				
				
				Device: Onyx Nova 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#27 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,874 
				Karma: 10700629 
				Join Date: May 2016 
				Location: Canada 
				
				
				Device: Onyx Nova 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Hey! Leave vi out of it! Blasphemer. vi is actually an example of excellent software, it just has a learning curve.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#28 | 
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,006 
				Karma: 71261339 
				Join Date: Feb 2009 
				
				
				
				Device: Kobo Clara 2E 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I use quite a bit of free software, and I would say it works as proper software should.  I'm a user of the software, not a developer.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#29 | 
| 
			
			
			
			 Diligent dilettante 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,662 
				Karma: 52758936 
				Join Date: Sep 2019 
				Location: in my mind 
				
				
				Device: Kobo Sage; Kobo Libra Colour 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#30 | |
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,891 
				Karma: 207182180 
				Join Date: Jan 2010 
				
				
				
				Device: Nexus 7, Kindle Fire HD 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 A brief listing of people who make free (and open-source) "crap" that runs on Linux: Microsoft Mozilla Apache Adobe Npm Oracle LibreOffice (The Document Foundation) GIMP (equally as powerful and as impossible to master as Photoshop) Python You want to say none of the products that the above produce for Linux works for you personally... fine. You'll get no argument from me. But if you want to continue to insist that free software == crap, then you're quite obviously full of it yourself. Crap software is crap software--whether it's free or paid for. The inverse is also true. Last edited by DiapDealer; 07-07-2021 at 11:12 AM.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Report on Abbyy FineReader OCR Software w/ Canon Lide 60 | 1611mac | Workshop | 6 | 01-27-2012 07:05 PM | 
| Accessories Hand-held Scanner with OCR Software | Hopi | enTourage Archive | 7 | 01-26-2011 07:40 PM | 
| OCR Software Help | kpfeifle | Workshop | 5 | 03-01-2010 03:27 PM | 
| Recommendation for basic scanning software (non OCR) | yunti | Workshop | 1 | 11-27-2009 08:08 AM | 
| OCR-Software für altdeutsche Schrift | mtravellerh | Software | 9 | 02-19-2009 03:29 PM |