MobileRead Forums - View Single Post

kiwidude · 04-28-2011, 12:55 PM

Ok, today's pop quiz question - who can offer me an efficient file comparison algorithm?

I've tried a first pass of finding books with the same size, and then a second pass using the sha256 hash. However this has two problems - (a) it is still pretty darn slow for large libraries (around 4.5 minutes to scan a 40,000 book library with a fair few formats), and (b) after all that it still isn't "accurate" enough, returning a bunch of duplicates which really aren't, they just "hash" together.

Suggestions on a postcard please

04-28-2011, 12:55 PM	#181
kiwidude Calibre Plugins Developer Posts: 4,732 Karma: 2197770 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Ok, today's pop quiz question - who can offer me an efficient file comparison algorithm? I've tried a first pass of finding books with the same size, and then a second pass using the sha256 hash. However this has two problems - (a) it is still pretty darn slow for large libraries (around 4.5 minutes to scan a 40,000 book library with a fair few formats), and (b) after all that it still isn't "accurate" enough, returning a bunch of duplicates which really aren't, they just "hash" together. Suggestions on a postcard please