I'm interested in extending the duplicate finding plugin to do aproximate text/cover matches. I've done both outside of calibre in the past in python successfully.
However, to do so, there are substantial library dependencies:
For text similarity (in a reasonable timeframe), I need a MinHashLSH implementation. I've used datasketch (
https://github.com/ekzhu/datasketch) previously, which has a hard dependency on numpy (and I'd really like scipy, as MinHashLSH can use scipy for speeding up the initialization there).
For Image similarity, a DCT based p-hash works very well. I can use a pure python DCT implementation, but scipy provides convenent DCT functions, and I need it anyways for MinHashLSH.
What's the correct way for working on plugins like this? From what I've read there's no way to specify that your plugin has external dependencies (requirements.txt, etc...). Vendoring (and packaging) versions of all the packages for every platform is prohibitive.