Quote:
Originally Posted by kovidgoyal
@jackie_w: calibre uses the ICU word break iteration algorithm, which as far as I recall, splits up most hyphenated words into two words (the details are language dependent), so, for example, abc-def will show up in the words list as two words, abc and def
See http://userguide.icu-project.org/boundaryanalysis for details
|
that's a valuable link - explains some things I've been puzzling about wrt leading & trailing apostrophes.
At
http://www.unicode.org/reports/tr29/#WB14 there is this with respect to word boundaries and hyphens
Quote:
The correct interpretation of hyphens in the context of word boundaries is challenging ... it is better overall to keep the hyphen out of the default definition
|
Is that to be interpreted as... a hyphen
should or
should not constitute a word boundary... I'm inclined to read it as should not.
BR