View Single Post
Old 04-26-2011, 02:11 PM   #166
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Playing around with this now - the ignore title rocks! Haven't played with the tag browser enough to comment on that. I agree with Chaley that this is more or less ready to release as is.

Soundex was really helpful too - I'm not sure if letting users tweak the fuzziness would help much, unless you're talking about making it less fuzzy - while it's quite useful for finding issues the other algorithms miss it does have a higher number of false positives.

I noticed you mentioned an issue with non-ascii in Soundex earlier - there is already a function in Calibre to convert a non-ascii character to it's ascii equivalent - are you using this already? I noticed Soundex caught China Miéville vs. China Mieville while the other algorithms missed this. Though thinking out loud doing this ascii downgrade any time you detect non-ascii for the purposes of comparison could be useful.


This might be an advanced/too specialized option, but I keep multiple version of book records, but general only one record that's 'published' to OPDS/Externally accessible library instances, etc. I do this by adding a tag 'Nopub' to the ones I don't want published. I'd rather do this than merge book records and risk having a faulty version overwrite a good version during conversion/merges etc. The faulty versions I keep around for conversion testing or just because I haven't gotten around to fully comparing the editions.

Anyway the thought behind the request is to automatically exempt sets of dupes where all but one in the set have some specific/configurable tag.

Other feedback:
  • Keyboard shortcut for exempting a group would be extremely helpful
  • Things seem to go a bit wonky when you reach the last set, at least with ignore title searches. After finishing all/most of the original sets it the 'next set' function began jumping all over the place and highlighting things that weren't really sets. I didn't even realize I was done until it started acting strange and I initiated a fresh search which returned no results.

edit: the non-ascii to ascii equivalent function is get_udc.decode() from calibre.utils.localization

Last edited by ldolse; 04-26-2011 at 02:18 PM.
ldolse is offline   Reply With Quote