| 
			
			 | 
		#1 | 
| 
			
			
			
			 Addict 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 222 
				Karma: 304158 
				Join Date: Jan 2016 
				Location: France 
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
			
			 
			
			Hello, 
		
	
		
		
		
		
		
		
		
		
		
		
		
			I notice some typos in the text layer added by an OCR into a "bitmap" PDF, ie. pages are actually scanned pages. I first tried opening the EPUB generated by Abbyy Finereader, but LibreOffice couldn't open it at all, while Sigil could after showing an error message but lacks a French dictionary to run the job (as far as I can tell). As an alternative, pdftotext or mutool (convert) can extract the text layer from such PDF, but can they put it back after I fixed the typos? Thank you. -- Edit: An easy solution is to convert the PDF to EPUB using Abbyy Finereader, and then run the HTML files within through a spellchecker. Last edited by Shohreh; 08-30-2024 at 04:28 AM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
    
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Can't extract text in image for MOBI/AZW3, despite using OCR, in Calibre for Kindle | ck18ss@brocku.ca | Conversion | 1 | 08-15-2022 06:34 PM | 
| (Open-source) application to extract text layer? | Shohreh | 5 | 02-11-2022 09:00 AM | |
| Tool to OCR an "image" PDF → add text as extra layer? | Shohreh | 5 | 12-19-2020 01:47 PM | |
| OCRmyPDF adds OCR text layer to scanned PDF files | orebmur | 0 | 01-20-2018 07:16 PM | |
| Scanned text pdf with OCR but graphical layer instead vectorial | whopper | 2 | 09-10-2011 07:32 PM | |