|  02-27-2016, 06:57 PM | #1 | 
| null operator (he/him)            Posts: 22,012 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | 
				
				Spellchecker - enhancements
			 
			
			Would it be possible to: 
 Last edited by BetterRed; 02-27-2016 at 07:10 PM. | 
|   |   | 
|  02-27-2016, 09:12 PM | #2 | |||||
| Sigil Developer            Posts: 9,072 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			BetterRed, Quote: 
 Quote: 
 Quote: 
 Quote: 
 Quote: 
 That said, wouldn't it be better to handle those words first by searching for all hyphenated words using a grep in Sigil's Find and Replace, see the word in context and decide if you want to keep the hyphen or not? Things like that are often better viewed in context of other text to see what the author actually meant. Then once that is done, you do the spellchecking. Again, the best place for requests for improvements are as an issue on the Sigil github site. Many more potential developers will see it there rather than here, and some might be convinced to try and create a pull request that implements some of them. It also means it won't get lost or forgotten. Take care, KevinH | |||||
|   |   | 
| Advert | |
|  | 
|  02-28-2016, 04:54 AM | #3 | |
| Imperfect Perfectionist            Posts: 715 Karma: 863576 Join Date: Dec 2011 Location: Ølstykke, Denmark Device: none | Quote: 
  ). There's an extension for LibreOffice doing this - Linguist - it's not maintained, but still works in LO 5 (AFAIK the latest incarnation of it is Python, and someone might be able to tweak it into a Sigil Plugin) and several VBA-macros for Word doing this can be found around the net (the ones I've tried are very slow, though) Regards, Kim | |
|   |   | 
|  02-28-2016, 09:40 AM | #4 | |
| Sigil Developer            Posts: 9,072 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Hi, For what purpose are those incorrect words used? KevinH Quote: 
 | |
|   |   | 
|  02-28-2016, 10:09 AM | #5 | 
| Imperfect Perfectionist            Posts: 715 Karma: 863576 Join Date: Dec 2011 Location: Ølstykke, Denmark Device: none | 
			
			For my part, I use them to generate search&replacement lists. In Denmark, we used to print books and papers in blackletter ("gothic") fonts up to 1915-1920 - my little niche of the danish book-market is reissuing some of those old texts in a form more readable to a modern reader. You can teach Finereader a lot, but not all - some of the letters are just to much like each other. However, there's usually some kind of system in the Finereader madness, and I can do mass replacements using such lists - prior to actually proofreading - with tools as wReplace and TransTools' Multiple Replace. It sometimes can improve the readability of the text immensely … As far as I remember, the original author of the Linguist-extension for Libreoffice made it to generate lists of words not recognised by the danish spellchecker (of which he was one of the original creators). Regards, Kim | 
|   |   | 
| Advert | |
|  | 
|  02-28-2016, 11:47 AM | #6 | 
| Sigil Developer            Posts: 9,072 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			I looked at what calibre does.  For every single word they spellcheck they use a regular expression replacement of the regular and short hyphens with nothing.  If that shortens the word, they then spellcheck the shortened version first as a new word, if it passes they add it first to the suggestions, and then go and spellcheck the original word, and then test each new suggestion to prevent duplication with the no hyphen suggestion. Sorry, Sigil is not going to go through all of that for a special case that only comes up for OCR text. Either a plugin or just normal find and replace can be done before the spellcheck to easily detect real hyphenation from OCR induced hyphenation and even better this would present the word in context. Sorry. KevinH | 
|   |   | 
|  02-28-2016, 08:59 PM | #7 | |
| null operator (he/him)            Posts: 22,012 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | Quote: 
  @Kim - FWIW the calibre editor will ctrl+c copy multiple words from its spellchecker word list. Today I would paste them into upcoming release of The Sage - especially into the Word List tool, from there I can copy words to the Concordancer, Rhymer etc. Why ? Character and place name tracking, consistent misspellings - especially in dialogue - across multiple books. All sorts of things. Curious - the Sigil spell checker seems to ignore 'words' that start with or maybe it's contain digits - is that by design? If yes - good, if not - don't fix it on my account  Maybe OCR is not the only source of misplaced hyphens. I've seen them in a number of purchased books that I'm pretty sure were not scanned. Apart from the misplaced hyphens there are none of the other OCR tell-tales. My guess is that they started life as SHY's and somewhere in the conversion hurdy-gurdy the SHY's were changed to regular hyphens. BR | |
|   |   | 
|  02-29-2016, 06:11 AM | #8 | |
| Wizard            Posts: 1,166 Karma: 1410083 Join Date: Nov 2010 Location: Germany Device: Sony PRS-650 | Quote: 
  ) The discussion around for implementation you can find here: https://www.mobileread.com/forums/sho...d.php?t=237869 I use this functionality very often in different situations. Last edited by Divingduck; 02-29-2016 at 06:21 AM. | |
|   |   | 
|  02-29-2016, 09:51 AM | #9 | 
| Sigil Developer            Posts: 9,072 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Hi DivingDuck, I am still not sure I want to do an extra regular expression removal of possible hyphenation and if so spellcheck that word effectively twice (just to get that suggestion first) when the underlying code will find and suggest the non-hyphenated version just fine. This really is a feature request for hunspell's suggestion mechanism and not Sigil. KevinH | 
|   |   | 
|  03-01-2016, 02:59 AM | #10 | |
| null operator (he/him)            Posts: 22,012 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | Quote: 
  Oh no - that's far too easy - doh  Maybe theducks will lend me his cat to hide under. BR Last edited by BetterRed; 03-01-2016 at 03:08 AM. | |
|   |   | 
|  03-01-2016, 12:38 PM | #11 | 
| Well trained by Cats            Posts: 31,251 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A | |
|   |   | 
|  03-01-2016, 05:53 PM | #12 | ||
| null operator (he/him)            Posts: 22,012 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | Quote: 
    Quote: 
 BR Last edited by BetterRed; 03-01-2016 at 07:47 PM. | ||
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Sigil 0.5.3 - spellchecker missing? | Naloomi | Sigil | 3 | 08-08-2012 09:03 PM | 
| Book jacket enhancements in 0.7.19 | GRiker | Calibre | 7 | 09-20-2010 09:19 PM | 
| How to apply the enhancements/patches ? | nubbol | Calibre | 2 | 09-04-2010 11:42 PM | 
| Enhancements in progress??? | crutledge | Sigil | 5 | 06-15-2010 02:14 PM | 
| Am I Missing Something? (spellchecker) | Guns4Hire | Sigil | 11 | 01-10-2010 06:57 AM |