![]() |
Suggestion: Spell Check Tool Enhancement
This topic spurred a thought in my brain for an enhancement to the Spell Check Tool:
https://www.mobileread.com/forums/sho...d.php?t=241283 I made a post (#6): https://www.mobileread.com/forums/sho...51&postcount=6 Long story short, it would be a nice feature enhancement to have a checkbox to allow a Case Sensitive SEARCH. For example, searching for a capital letter 'O' will allow "Octopus" to be shown in the list, but not "octopus". Searching for a capital 'I' will show "McIver", but not "Mciver". Having the checkbox off will be the default (current) implementation (not case sensitive). As I mentioned in that topic, I believe a Case Sensitive Search would be extremely helpful for catching many of these hard to find OCR errors (capital 'I' instead of lowercase 'l', etc. etc....) |
|
Fantastic work Kovid, thank you for implementing this tweak so quickly.
Just used it to catch a bunch of capital 'I' -> 'l', which are a common OCR error: couId concIusions falI faiI goodwiII piIgramage uItimate welI [...] EXTREMELY helpful addition. |
Hmmm... so today I was fiddling around some more with the Calibre Spell Check tool, and I stumbled across this problem.
The hyphen '-' should be considered a legitimate character for a word. Example of how it currently works: The word "non-fiction" is seen as two words, "non" and "fiction". The word "micro-economics" is seen as two words, "micro" and "economics". The word "anti-establishment" is seen as two words, "anti" and "establishment". A few reasons why this fix would be extremely useful: 1. I use this ALL THE TIME in Sigil in order to catch usages of non-hyphenated and hyphenated versions of words. It is QUITE a common OCR error, where you might have mixes of "nonfiction" + "non-fiction", "co-operating" + "cooperating", "counter-clockwise" + "counterclockwise", "short-term" + "shortterm" in the same book. These typically then have to be made consistent/normalized throughout the book. 2. It makes it quite helpful to catch accidental hyphens in author's first/last names. For example, "Black-well" -> "Blackwell", "How-den" -> "Howden", "Lach-mann" -> "Lachmann", "Lee-son" -> "Leeson". |
This is a limitation of ICU, it breaks words on hyphens, even though its documentation claims it shouldn't. It is on my TODO list to see if I can implement an efficient workaround.
|
Quote:
BR |
Quote:
I am using Calibre's list to point out/narrow down the errors, and then just doing all my fixes in Sigil. Suggestion: Another odd thing I noticed in Calibre's Spell Check List is numbers. I believe that "words" that are completely made of numbers + periods + commas should not be included in the list at all. I believe the way that Sigil handles it, a "word" with ANY numbers is removed. But after seeing Calibre's list, I still think it is useful if "words" with SOME numbers are still left there. For example, these can then be caught/stand out like sore thumbs:
Seeing these in list form + the amount of times they occur in the book is extremely helpful for spotting inconsistencies. Perhaps you can safely remove "words" that are FULLY numbers, but still keep the ones that are SOME numbers? Perhaps it can be another toggle? Include numbers, not include numbers? (Or perhaps this would make the UI too cluttered?). Side Note: I am currently working on digitizing 12 years of a journal (~ 2 million words). The perfect size to put Calibre's Editor through some serious testing! Now, all we need is the fantastic Reports functionality to come over to Calibre's Editor. |
Quote:
And I would like to have a way to ignore the words with numbers. An alternative to an option is to have the "Ignore" button work on multiple words. Select all the words you want to ignore and press the button once. At the moment it seems to work only on the last word selected. |
1 Attachment(s)
Talking of numbers. If a book has an index, all the page number links are flagged as errors - see attachment. If I ignore all those 'numbers' I get to watch an hourglass for anything up to 10 minutes. Also if a book has a long list of references many of the author names will be flagged as errors.
IMO an index or reference list should be in separate files, and they normally are. So, if one could exclude files from spell checking then one could deal with the index and reference files separately. Be nice to have the ability to exclude paragraphs too - to avoid checking quotes in the original vernacular - eg Chaucer, Shakespeare etc :) BR |
Quote:
(Typically names spelled wrong, missing accents in names/words, etc. etc.) Quote:
|
@davidfor: You can ignore multiple words by selecting them and right clicking. The buttons only operate on a single word at a time.
|
Quote:
And then deal with the others in a separate pass(es) and maybe exclude the body of the book. My thinking is that the file exclusions would not persist between sessions. Quote:
Currently I ignore 'misspellings' in Shakespeare et al quotes; but t'would be most felicitous to do otherwise;) In Word you can exclude blocks from its spell checker I think they persist until you do a spelling check reset on the document. But I repeat, for me it's a nice to have. BR |
Quote:
|
While we are discussing minor tweaks to Spellcheck:
Suggestion: Possible to ignore spellcheck of text between links: Quote:
|
Quote:
|
| All times are GMT -4. The time now is 07:03 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.