| 
			
			 | 
		#1 | 
| 
			
			
			
			 null operator (he/him) 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,018 
				Karma: 30277294 
				Join Date: Mar 2012 
				Location: Sydney Australia 
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Spellchecker - enhancements
			 
			
			
			Would it be possible to: 
		
	
		
		
		
		
		
		
		
		
		
		
		
			
 Last edited by BetterRed; 02-27-2016 at 08:10 PM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | |||||
| 
			
			
			
			 Sigil Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,072 
				Karma: 6361556 
				Join Date: Nov 2009 
				
				
				
				Device: many 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			BetterRed, 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Quote: 
	
 Quote: 
	
 Quote: 
	
 Quote: 
	
 Quote: 
	
 That said, wouldn't it be better to handle those words first by searching for all hyphenated words using a grep in Sigil's Find and Replace, see the word in context and decide if you want to keep the hyphen or not? Things like that are often better viewed in context of other text to see what the author actually meant. Then once that is done, you do the spellchecking. Again, the best place for requests for improvements are as an issue on the Sigil github site. Many more potential developers will see it there rather than here, and some might be convinced to try and create a pull request that implements some of them. It also means it won't get lost or forgotten. Take care, KevinH  | 
|||||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| Advert | |
| 
         | 
    
| 
			
			 | 
		#3 | |
| 
			
			
			
			 Imperfect Perfectionist 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 720 
				Karma: 863576 
				Join Date: Dec 2011 
				Location: Ølstykke, Denmark 
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
   ). There's an extension for LibreOffice doing this - Linguist - it's not maintained, but still works in LO 5 (AFAIK the latest incarnation of it is Python, and someone might be able to tweak it into a Sigil Plugin) and several VBA-macros for Word doing this can be found around the net (the ones I've tried are very slow, though)Regards, Kim  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | |
| 
			
			
			
			 Sigil Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,072 
				Karma: 6361556 
				Join Date: Nov 2009 
				
				
				
				Device: many 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Hi, 
		
	
		
		
		
		
		
		
		
		
		
		
	
	For what purpose are those incorrect words used? KevinH Quote: 
	
  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 Imperfect Perfectionist 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 720 
				Karma: 863576 
				Join Date: Dec 2011 
				Location: Ølstykke, Denmark 
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			For my part, I use them to generate search&replacement lists. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	In Denmark, we used to print books and papers in blackletter ("gothic") fonts up to 1915-1920 - my little niche of the danish book-market is reissuing some of those old texts in a form more readable to a modern reader. You can teach Finereader a lot, but not all - some of the letters are just to much like each other. However, there's usually some kind of system in the Finereader madness, and I can do mass replacements using such lists - prior to actually proofreading - with tools as wReplace and TransTools' Multiple Replace. It sometimes can improve the readability of the text immensely … As far as I remember, the original author of the Linguist-extension for Libreoffice made it to generate lists of words not recognised by the danish spellchecker (of which he was one of the original creators). Regards, Kim  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| Advert | |
| 
         | 
    
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Sigil Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,072 
				Karma: 6361556 
				Join Date: Nov 2009 
				
				
				
				Device: many 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I looked at what calibre does.  For every single word they spellcheck they use a regular expression replacement of the regular and short hyphens with nothing.  If that shortens the word, they then spellcheck the shortened version first as a new word, if it passes they add it first to the suggestions, and then go and spellcheck the original word, and then test each new suggestion to prevent duplication with the no hyphen suggestion. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Sorry, Sigil is not going to go through all of that for a special case that only comes up for OCR text. Either a plugin or just normal find and replace can be done before the spellcheck to easily detect real hyphenation from OCR induced hyphenation and even better this would present the word in context. Sorry. KevinH  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | |
| 
			
			
			
			 null operator (he/him) 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,018 
				Karma: 30277294 
				Join Date: Mar 2012 
				Location: Sydney Australia 
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 ![]() @Kim - FWIW the calibre editor will ctrl+c copy multiple words from its spellchecker word list. Today I would paste them into upcoming release of The Sage - especially into the Word List tool, from there I can copy words to the Concordancer, Rhymer etc. Why ? Character and place name tracking, consistent misspellings - especially in dialogue - across multiple books. All sorts of things. Curious - the Sigil spell checker seems to ignore 'words' that start with or maybe it's contain digits - is that by design? If yes - good, if not - don't fix it on my account ![]() Maybe OCR is not the only source of misplaced hyphens. I've seen them in a number of purchased books that I'm pretty sure were not scanned. Apart from the misplaced hyphens there are none of the other OCR tell-tales. My guess is that they started life as SHY's and somewhere in the conversion hurdy-gurdy the SHY's were changed to regular hyphens. BR  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,166 
				Karma: 1410083 
				Join Date: Nov 2010 
				Location: Germany 
				
				
				Device: Sony PRS-650 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
   )The discussion around for implementation you can find here: https://www.mobileread.com/forums/sho...d.php?t=237869 I use this functionality very often in different situations. Last edited by Divingduck; 02-29-2016 at 07:21 AM.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#9 | 
| 
			
			
			
			 Sigil Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,072 
				Karma: 6361556 
				Join Date: Nov 2009 
				
				
				
				Device: many 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Hi DivingDuck, 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I am still not sure I want to do an extra regular expression removal of possible hyphenation and if so spellcheck that word effectively twice (just to get that suggestion first) when the underlying code will find and suggest the non-hyphenated version just fine. This really is a feature request for hunspell's suggestion mechanism and not Sigil. KevinH  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#10 | |
| 
			
			
			
			 null operator (he/him) 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,018 
				Karma: 30277294 
				Join Date: Mar 2012 
				Location: Sydney Australia 
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
   Oh no - that's far too easy - doh ![]() Maybe theducks will lend me his cat to hide under. BR Last edited by BetterRed; 03-01-2016 at 04:08 AM.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#11 | 
| 
			
			
			
			 Well trained by Cats 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,267 
				Karma: 61916422 
				Join Date: Aug 2009 
				Location: The Central Coast of California 
				
				
				Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#12 | ||
| 
			
			
			
			 null operator (he/him) 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,018 
				Karma: 30277294 
				Join Date: Mar 2012 
				Location: Sydney Australia 
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 ![]() ![]() ![]() Quote: 
	
 BR Last edited by BetterRed; 03-01-2016 at 08:47 PM.  | 
||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
    
| Thread Tools | Search this Thread | 
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Sigil 0.5.3 - spellchecker missing? | Naloomi | Sigil | 3 | 08-08-2012 10:03 PM | 
| Book jacket enhancements in 0.7.19 | GRiker | Calibre | 7 | 09-20-2010 10:19 PM | 
| How to apply the enhancements/patches ? | nubbol | Calibre | 2 | 09-05-2010 12:42 AM | 
| Enhancements in progress??? | crutledge | Sigil | 5 | 06-15-2010 03:14 PM | 
| Am I Missing Something? (spellchecker) | Guns4Hire | Sigil | 11 | 01-10-2010 07:57 AM |