Quote:
Originally Posted by AnotherCat
I think perhaps where I am differing is that I see the finding of the likes of hyphenated words as being trivial (after all, even a simple search for "-" finds them most reliably no matter what context they are used in) and so really don't care how a spellchecker does it.
|
I don't have any stats on hand from before I requested the Spell Check Tool be added into Sigil, but I spent HOURS doing it the manual method, and it was BRUTAL.
It just took gods damn forever, and it is nowhere near as accurate. (After seeing the same words again and again, you fall into this monotonous brain-dead/vegetative state after doing it for so many hours straight.

).
Doing it the manual way, there is no way to "ignore" a word that you already know is correctly hyphenated.
If "long-term" shows up 183 times, you had to Next 183 times.
If "short-term" shows up 111 times, you had to Next 111 times.
If "Irvington-on-Hudson" shows up 34 times, you had to Next 68 times.
Now, since all unique words are shoved into the list ONCE, this really saves the amount of time your eyeballs have to work + how many times you have to ignore/fix mistakes.
Just those three words, you have wittled down 362 clicks into a quick look at a list.
With the Spell Check List, I typically just do two passes. (One with the "Show Only Misspelled Words" checkbox on, one with it off). You might also want to fiddle around with the Frequency sorting, because just visualizing the data in a different way helps you sort through and catch things much faster (and errors you may have previously glanced over).
For example, many of these hyphen -> em dash errors only occur once, so you can focus more on the words in the list with <3. Something that occurs ~>8 times in a book is probably not a typo, and can just be quickly looked over.
Side Note: This latest journal I am working on digitizing (~4400 pages, ~2 million words), there were ~18800 hyphens before -> 18428 after fixing (this means ~2% of the hyphens were a mistake.)
That would have taken fracking FOREVER to do one-by-one (it already took me 12 hours to do it the Spell Check List way (including all the time double-checking/fixing the source material, plus doing some code cleanup + other spelling corrections).

).
It would be QUITE interesting to gather these stats in the future. How many words have hyphens compared to total words, how many hyphens were wrong (and what %)... I will be sure to keep it in mind in the future. Maybe add it to all my fancy stats I am gathering (Preview here):
https://www.mobileread.com/forums/sho...8&postcount=52
Another thing to toss on my "growing pile of things I should do but don't have the time because I have to digitize more stuff" list.
Quote:
Originally Posted by AnotherCat
Note, I am not averse to, nor am I intending to denigrate, any effort towards improving Editor's or any other spellchecker, nor criticising the methods of other users, but am rather just stating how I go about some of the matters that have arisen or matters I encounter which complicates and reduces their competence when using them.
|
Definitely recommend using the Spell Check Tool for hyphens.
Also for accented words + misspelled names + (Now with Calibre's Case Sensitive Search) OCR errors of letters<->numbers:
https://www.mobileread.com/forums/sho...51&postcount=6
https://www.mobileread.com/forums/sho...08&postcount=7
It is the greatest thing since sliced bread!