Quote:
Originally Posted by kiwidude
Just one further point - your last example of angle / angel would only be found (possibly) by a soundex based algorithm as chaley has mentioned a few times. Is this something that is easily done from Python?
|
Soundex is easy to compute. See
http://code.activestate.com/recipes/...dex-algorithm/. My approach would be to parse the item to words, eliminate all punctuation, compute the soundex of each word, add each word to a set. A 'conservative' comparison would compare for set equality. Less conservative would check for N matches out of M words.
Note that soundex is by nature not conservative. For example, 'holly' and 'healey' generate equal soundex strings, also equaling the string generated by 'hilly' and 'hayley'.
Note**2: that Knuth's algorithm works with any accuracy on words that use English pronunciation rules. I think that it a large enough 'market' to make it useful.