View Single Post
Old 04-29-2011, 11:22 AM   #207
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,733
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@Charles. One further thought to throw into the mix on the performance stuff. I don't know how big your database was, but I have found that for smaller size databases, there is a certain amount of (os?) caching which takes place that can significantly affect things.

To further explain what I mean - with a 1500 book (4200 format) database, the first time I do a scan it takes around 13-15 seconds. Of that, the majority of the time is spent in the first pass doing os.stat on those files to get the file size.

If I then run that check again, the check runs in 1.5 seconds. Which approach I use to analysing size duplicates (always or via book plugin data) is pretty immaterial in this situation - as there was only about 22 books or so that had size collisions. The same number of files have had os.stat run on them, but due to presumably some lower level os caching that check completed extremely quickly.

However for my large test database, it would appear that with 75000 formats to get the file size of, the caching has negligible effect. So the first pass of os.stat takes about the same time when you run it repeatedly.

My point being that with the numbers you had above, your dramatically improved performance 15 times in a row etc could just be because of the caching effect.

I'm going to disable using book plugin data unless we can nail down its exact problem, the performance cost is orders of magnitude too high at this point.
kiwidude is offline   Reply With Quote