MobileRead Forums - View Single Post

chaley · 04-23-2011, 12:22 PM

Quote:

Originally Posted by kiwidude

Just one further point - your last example of angle / angel would only be found (possibly) by a soundex based algorithm as chaley has mentioned a few times. Is this something that is easily done from Python?

Soundex is easy to compute. See http://code.activestate.com/recipes/...dex-algorithm/. My approach would be to parse the item to words, eliminate all punctuation, compute the soundex of each word, add each word to a set. A 'conservative' comparison would compare for set equality. Less conservative would check for N matches out of M words.

Note that soundex is by nature not conservative. For example, 'holly' and 'healey' generate equal soundex strings, also equaling the string generated by 'hilly' and 'hayley'.

Note**2: that Knuth's algorithm works with any accuracy on words that use English pronunciation rules. I think that it a large enough 'market' to make it useful.