![]() |
#1 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,634
Karma: 29710510
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Spellchecker - enhancements
Would it be possible to:
Last edited by BetterRed; 02-27-2016 at 07:10 PM. |
![]() |
![]() |
![]() |
#2 | |||||
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,490
Karma: 5703586
Join Date: Nov 2009
Device: many
|
BetterRed,
Quote:
Quote:
Quote:
Quote:
Quote:
That said, wouldn't it be better to handle those words first by searching for all hyphenated words using a grep in Sigil's Find and Replace, see the word in context and decide if you want to keep the hyphen or not? Things like that are often better viewed in context of other text to see what the author actually meant. Then once that is done, you do the spellchecking. Again, the best place for requests for improvements are as an issue on the Sigil github site. Many more potential developers will see it there rather than here, and some might be convinced to try and create a pull request that implements some of them. It also means it won't get lost or forgotten. Take care, KevinH |
|||||
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Imperfect Perfectionist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 630
Karma: 863576
Join Date: Dec 2011
Location: Ølstykke, Denmark
Device: none
|
Quote:
![]() Regards, Kim |
|
![]() |
![]() |
![]() |
#4 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,490
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Hi,
For what purpose are those incorrect words used? KevinH Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Imperfect Perfectionist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 630
Karma: 863576
Join Date: Dec 2011
Location: Ølstykke, Denmark
Device: none
|
For my part, I use them to generate search&replacement lists.
In Denmark, we used to print books and papers in blackletter ("gothic") fonts up to 1915-1920 - my little niche of the danish book-market is reissuing some of those old texts in a form more readable to a modern reader. You can teach Finereader a lot, but not all - some of the letters are just to much like each other. However, there's usually some kind of system in the Finereader madness, and I can do mass replacements using such lists - prior to actually proofreading - with tools as wReplace and TransTools' Multiple Replace. It sometimes can improve the readability of the text immensely … As far as I remember, the original author of the Linguist-extension for Libreoffice made it to generate lists of words not recognised by the danish spellchecker (of which he was one of the original creators). Regards, Kim |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,490
Karma: 5703586
Join Date: Nov 2009
Device: many
|
I looked at what calibre does. For every single word they spellcheck they use a regular expression replacement of the regular and short hyphens with nothing. If that shortens the word, they then spellcheck the shortened version first as a new word, if it passes they add it first to the suggestions, and then go and spellcheck the original word, and then test each new suggestion to prevent duplication with the no hyphen suggestion.
Sorry, Sigil is not going to go through all of that for a special case that only comes up for OCR text. Either a plugin or just normal find and replace can be done before the spellcheck to easily detect real hyphenation from OCR induced hyphenation and even better this would present the word in context. Sorry. KevinH |
![]() |
![]() |
![]() |
#7 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,634
Karma: 29710510
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
![]() @Kim - FWIW the calibre editor will ctrl+c copy multiple words from its spellchecker word list. Today I would paste them into upcoming release of The Sage - especially into the Word List tool, from there I can copy words to the Concordancer, Rhymer etc. Why ? Character and place name tracking, consistent misspellings - especially in dialogue - across multiple books. All sorts of things. Curious - the Sigil spell checker seems to ignore 'words' that start with or maybe it's contain digits - is that by design? If yes - good, if not - don't fix it on my account ![]() Maybe OCR is not the only source of misplaced hyphens. I've seen them in a number of purchased books that I'm pretty sure were not scanned. Apart from the misplaced hyphens there are none of the other OCR tell-tales. My guess is that they started life as SHY's and somewhere in the conversion hurdy-gurdy the SHY's were changed to regular hyphens. BR |
|
![]() |
![]() |
![]() |
#8 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Quote:
![]() The discussion around for implementation you can find here: https://www.mobileread.com/forums/sho...d.php?t=237869 I use this functionality very often in different situations. Last edited by Divingduck; 02-29-2016 at 06:21 AM. |
|
![]() |
![]() |
![]() |
#9 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,490
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Hi DivingDuck,
I am still not sure I want to do an extra regular expression removal of possible hyphenation and if so spellcheck that word effectively twice (just to get that suggestion first) when the underlying code will find and suggest the non-hyphenated version just fine. This really is a feature request for hunspell's suggestion mechanism and not Sigil. KevinH |
![]() |
![]() |
![]() |
#10 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,634
Karma: 29710510
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
![]() ![]() Maybe theducks will lend me his cat to hide under. BR Last edited by BetterRed; 03-01-2016 at 03:08 AM. |
|
![]() |
![]() |
![]() |
#11 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,913
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
![]() |
![]() |
![]() |
#12 | ||
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,634
Karma: 29710510
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
![]() ![]() ![]() Quote:
BR Last edited by BetterRed; 03-01-2016 at 07:47 PM. |
||
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sigil 0.5.3 - spellchecker missing? | Naloomi | Sigil | 3 | 08-08-2012 09:03 PM |
Book jacket enhancements in 0.7.19 | GRiker | Calibre | 7 | 09-20-2010 09:19 PM |
How to apply the enhancements/patches ? | nubbol | Calibre | 2 | 09-04-2010 11:42 PM |
Enhancements in progress??? | crutledge | Sigil | 5 | 06-15-2010 02:14 PM |
Am I Missing Something? (spellchecker) | Guns4Hire | Sigil | 11 | 01-10-2010 06:57 AM |