View Single Post
Old 04-24-2011, 09:15 AM   #128
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by kiwidude View Post
Wow, a simple bit of magic like that for soundex? Very cool, thx. I guess I could use the same approach as "similar title" as the starting point (stripping subtitles, punctuation etc) and then applying the soundex to that.

The question once again becomes the permutations... currently we have this:
1. Matching ISBN only
2. Identical title, ignore author
3. Similar title, ignore author
4. Similar title, identical author
5. Similar title, similar author*
6. Ignore title, similar author*

for 5 & 6, as mentioned previously "similar author" is going to change to be more conservative to not ignore initials. We will add at least one more fuzzier author option (which for example looks at a surname plus first initial only)
7. Ignore title, fuzzy author

Now we have soundex. Does it make sense to only apply it to titles rather than author names? As presumably you have the same problems of author initials etc causing problems with the results? So maybe we add:
8. Soundex title, similar author

How does that sound?
Sounds good.
I have one question. It seems same title, ignore author does not completely ignore autor.

2 cases:
1. 1500 books comics of Donald Duck. 5 authors scan-time: less than half a minute result: 20 duplicates.
2. 500 books. 212 authors. scan-time: infinity? (processor stays on 100% (for one core) for longer than 12 hours).
So it may be not complete ignore?
drMerry is offline   Reply With Quote