@jackie_w: calibre uses the ICU word break iteration algorithm, which as far as I recall, splits up most hyphenated words into two words (the details are language dependent), so, for example, abc-def will show up in the words list as two words, abc and def
See
http://userguide.icu-project.org/boundaryanalysis for details