Hmmm... so today I was fiddling around some more with the Calibre Spell Check tool, and I stumbled across this problem.
The hyphen '-' should be considered a legitimate character for a word. Example of how it currently works:
The word "non-fiction" is seen as two words, "non" and "fiction".
The word "micro-economics" is seen as two words, "micro" and "economics".
The word "anti-establishment" is seen as two words, "anti" and "establishment".
A few reasons why this fix would be extremely useful:
1. I use this ALL THE TIME in Sigil in order to catch usages of non-hyphenated and hyphenated versions of words. It is QUITE a common OCR error, where you might have mixes of "nonfiction" + "non-fiction", "co-operating" + "cooperating", "counter-clockwise" + "counterclockwise", "short-term" + "shortterm" in the same book. These typically then have to be made consistent/normalized throughout the book.
2. It makes it quite helpful to catch accidental hyphens in author's first/last names. For example, "Black-well" -> "Blackwell", "How-den" -> "Howden", "Lach-mann" -> "Lachmann", "Lee-son" -> "Leeson".
Last edited by Tex2002ans; 07-06-2014 at 06:59 AM.
|