View Single Post
Old 04-23-2011, 12:22 PM   #125
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,449
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
Just one further point - your last example of angle / angel would only be found (possibly) by a soundex based algorithm as chaley has mentioned a few times. Is this something that is easily done from Python?
Soundex is easy to compute. See http://code.activestate.com/recipes/...dex-algorithm/. My approach would be to parse the item to words, eliminate all punctuation, compute the soundex of each word, add each word to a set. A 'conservative' comparison would compare for set equality. Less conservative would check for N matches out of M words.

Note that soundex is by nature not conservative. For example, 'holly' and 'healey' generate equal soundex strings, also equaling the string generated by 'hilly' and 'hayley'.

Note**2: that Knuth's algorithm works with any accuracy on words that use English pronunciation rules. I think that it a large enough 'market' to make it useful.
chaley is offline   Reply With Quote