MobileRead Forums - View Single Post

ldolse · 04-26-2011, 02:11 PM

Playing around with this now - the ignore title rocks! Haven't played with the tag browser enough to comment on that. I agree with Chaley that this is more or less ready to release as is.

Soundex was really helpful too - I'm not sure if letting users tweak the fuzziness would help much, unless you're talking about making it less fuzzy - while it's quite useful for finding issues the other algorithms miss it does have a higher number of false positives.

I noticed you mentioned an issue with non-ascii in Soundex earlier - there is already a function in Calibre to convert a non-ascii character to it's ascii equivalent - are you using this already? I noticed Soundex caught China Miéville vs. China Mieville while the other algorithms missed this. Though thinking out loud doing this ascii downgrade any time you detect non-ascii for the purposes of comparison could be useful.

This might be an advanced/too specialized option, but I keep multiple version of book records, but general only one record that's 'published' to OPDS/Externally accessible library instances, etc. I do this by adding a tag 'Nopub' to the ones I don't want published. I'd rather do this than merge book records and risk having a faulty version overwrite a good version during conversion/merges etc. The faulty versions I keep around for conversion testing or just because I haven't gotten around to fully comparing the editions.

Anyway the thought behind the request is to automatically exempt sets of dupes where all but one in the set have some specific/configurable tag.

Other feedback:

Keyboard shortcut for exempting a group would be extremely helpful
Things seem to go a bit wonky when you reach the last set, at least with ignore title searches. After finishing all/most of the original sets it the 'next set' function began jumping all over the place and highlighting things that weren't really sets. I didn't even realize I was done until it started acting strange and I initiated a fresh search which returned no results.

edit: the non-ascii to ascii equivalent function is get_udc.decode() from calibre.utils.localization

04-26-2011, 02:11 PM	#166
ldolse Wizard Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Playing around with this now - the ignore title rocks! Haven't played with the tag browser enough to comment on that. I agree with Chaley that this is more or less ready to release as is. Soundex was really helpful too - I'm not sure if letting users tweak the fuzziness would help much, unless you're talking about making it less fuzzy - while it's quite useful for finding issues the other algorithms miss it does have a higher number of false positives. I noticed you mentioned an issue with non-ascii in Soundex earlier - there is already a function in Calibre to convert a non-ascii character to it's ascii equivalent - are you using this already? I noticed Soundex caught China Miéville vs. China Mieville while the other algorithms missed this. Though thinking out loud doing this ascii downgrade any time you detect non-ascii for the purposes of comparison could be useful. This might be an advanced/too specialized option, but I keep multiple version of book records, but general only one record that's 'published' to OPDS/Externally accessible library instances, etc. I do this by adding a tag 'Nopub' to the ones I don't want published. I'd rather do this than merge book records and risk having a faulty version overwrite a good version during conversion/merges etc. The faulty versions I keep around for conversion testing or just because I haven't gotten around to fully comparing the editions. Anyway the thought behind the request is to automatically exempt sets of dupes where all but one in the set have some specific/configurable tag. Other feedback: Keyboard shortcut for exempting a group would be extremely helpful Things seem to go a bit wonky when you reach the last set, at least with ignore title searches. After finishing all/most of the original sets it the 'next set' function began jumping all over the place and highlighting things that weren't really sets. I didn't even realize I was done until it started acting strange and I initiated a fresh search which returned no results. edit: the non-ascii to ascii equivalent function is get_udc.decode() from calibre.utils.localization Last edited by ldolse; 04-26-2011 at 02:18 PM.