View Single Post
Old 07-06-2014, 05:42 AM   #4
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Hmmm... so today I was fiddling around some more with the Calibre Spell Check tool, and I stumbled across this problem.

The hyphen '-' should be considered a legitimate character for a word. Example of how it currently works:

The word "non-fiction" is seen as two words, "non" and "fiction".
The word "micro-economics" is seen as two words, "micro" and "economics".
The word "anti-establishment" is seen as two words, "anti" and "establishment".

A few reasons why this fix would be extremely useful:

1. I use this ALL THE TIME in Sigil in order to catch usages of non-hyphenated and hyphenated versions of words. It is QUITE a common OCR error, where you might have mixes of "nonfiction" + "non-fiction", "co-operating" + "cooperating", "counter-clockwise" + "counterclockwise", "short-term" + "shortterm" in the same book. These typically then have to be made consistent/normalized throughout the book.

2. It makes it quite helpful to catch accidental hyphens in author's first/last names. For example, "Black-well" -> "Blackwell", "How-den" -> "Howden", "Lach-mann" -> "Lachmann", "Lee-son" -> "Leeson".

Last edited by Tex2002ans; 07-06-2014 at 06:59 AM.
Tex2002ans is offline   Reply With Quote