Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 04-28-2011, 02:30 PM   #196
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Right - I suspected the read above... thats what you get for copying code off the web

This is where I stole it from...
http://www.gossamer-threads.com/list.../python/739198

Last edited by kiwidude; 04-28-2011 at 02:32 PM.
kiwidude is offline   Reply With Quote
Old 04-28-2011, 02:31 PM   #197
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by chaley View Post
The problem is the file(xxx).read(). That is opening the file in text mode, not binary mode. The read is finding a ^Z, which is end of file.
Cool! It's amazing how subtle something like this can be. He's probably getting ^Z near a standard file header (possibly in the .doc files?), the files end up with identical content and the hash matches.
Starson17 is offline   Reply With Quote
Old 04-28-2011, 02:37 PM   #198
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
Right - I suspected the read above... thats what you get for copying code off the web

This is where I stole it from...
http://www.gossamer-threads.com/list.../python/739198
From that post:
Code:
$ python
Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45)
[GCC 4.3.3] on linux2
The code would work there. That guy was running on linux, which always uses binary mode. One of the traps and pitfalls of multiplatform support.
chaley is offline   Reply With Quote
Old 04-28-2011, 02:38 PM   #199
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Phew, all fixed now, thx guys.

I guess the next decision is whether to make this a background job or just make the user wait. On a 1500 book/4000 format library it takes around 15 seconds.
kiwidude is offline   Reply With Quote
Old 04-28-2011, 02:42 PM   #200
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
Phew, all fixed now, thx guys.

I guess the next decision is whether to make this a background job or just make the user wait. On a 1500 book/4000 format library it takes around 15 seconds.
Make'em wait. The computer will get lonely.

On a more serious note, as you are almost certainly using os.stat, you have the mtime as well as the size. You may consider storing those two values and the hash (I imagine you got rid of the double hash) of each format with the book, using the plugin storage facility. Check the size+mtime before recomputing the hash, and use the stored hash if the values haven't changed.
chaley is offline   Reply With Quote
Old 04-28-2011, 02:44 PM   #201
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Bonza idea.
kiwidude is offline   Reply With Quote
Old 04-28-2011, 04:33 PM   #202
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Ok, maybe not such a good idea

Having the extra calls in for add_custom_book_data and get_custom_book_data means it is averaging 10 seconds per 50 books. So instead of taking about 4 minutes on that first run it takes 23 minutes.

And subsequent runs are still around the 4 minute mark.

I commented the code all out and ran the analysis again - 2.5 minutes. Ahh well - users can just go make themselves a cuppa on their large libraries
kiwidude is offline   Reply With Quote
Old 04-28-2011, 04:40 PM   #203
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
Ok, maybe not such a good idea

Having the extra calls in for add_custom_book_data and get_custom_book_data means it is averaging 10 seconds per 50 books. So instead of taking about 4 minutes on that first run it takes 23 minutes.

And subsequent runs are still around the 4 minute mark.

I commented the code all out and ran the analysis again - 2.5 minutes. Ahh well - users can just go make themselves a cuppa on their large libraries
It isn't reasonable that checking and adding the custom code could take 1/2 second per book. Something must be very broken somewhere.

Could you post the code you are using? Or give me a 'broken' copy of the plugin? I want to figure out what is going on.
chaley is offline   Reply With Quote
Old 04-28-2011, 05:07 PM   #204
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Ok, here is a version for you to play with - I haven't started work on the exemption group changes yet but the new gui around isbn/binary comparisons etc is done. The code you will be interested in is in algorithms.py around line 500 or so. I have made no attempts to optimise (there weren't a whole lot of options to do so with the current API for plugin book data as we've talked about previously when I used to use it on the goodreads plugin). It does seem freakishly slow adding those lines in though.

Last edited by kiwidude; 04-30-2011 at 04:55 PM.
kiwidude is offline   Reply With Quote
Old 04-28-2011, 05:34 PM   #205
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
@kiwidude: did you run the create experiment more than once? There seems to be a first-time problem. I ran the test, and it created a few duplicate groups, taking 22 seconds! I then deleted the records from the DB using SQLiteSpy and ran it again. This time it took 1.3 seconds. Around 15 more runs all produce the same number.

I am thinking that the first time it runs, it needs to auto-create the indices or some such. Do you have evidence one way or the other?
chaley is offline   Reply With Quote
Old 04-28-2011, 05:42 PM   #206
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
I have run it many, many times - and killed it many times . I clean the database table before every reset. Only once I have I let it run to completion with the table clean, and then once after that to give you the numbers in the post above to show what it was like. I ripped the code out, reset the database table and got the 2.5 min run time. Then you said you wanted to experiment, so I put the code back in, started running it again, saw it was still taking the same 9 seconds or so per 50 records and killed it after the first 1,000 or so.
kiwidude is offline   Reply With Quote
Old 04-29-2011, 11:22 AM   #207
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@Charles. One further thought to throw into the mix on the performance stuff. I don't know how big your database was, but I have found that for smaller size databases, there is a certain amount of (os?) caching which takes place that can significantly affect things.

To further explain what I mean - with a 1500 book (4200 format) database, the first time I do a scan it takes around 13-15 seconds. Of that, the majority of the time is spent in the first pass doing os.stat on those files to get the file size.

If I then run that check again, the check runs in 1.5 seconds. Which approach I use to analysing size duplicates (always or via book plugin data) is pretty immaterial in this situation - as there was only about 22 books or so that had size collisions. The same number of files have had os.stat run on them, but due to presumably some lower level os caching that check completed extremely quickly.

However for my large test database, it would appear that with 75000 formats to get the file size of, the caching has negligible effect. So the first pass of os.stat takes about the same time when you run it repeatedly.

My point being that with the numbers you had above, your dramatically improved performance 15 times in a row etc could just be because of the caching effect.

I'm going to disable using book plugin data unless we can nail down its exact problem, the performance cost is orders of magnitude too high at this point.
kiwidude is offline   Reply With Quote
Old 04-29-2011, 11:34 AM   #208
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,374
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I would suggest minimizing writes to the db. i.e. keep your lists in memory until the end of the search and only then write to the database, preferably with a single executemany call.You probably need to add another API method to database2 for that.
kovidgoyal is offline   Reply With Quote
Old 04-29-2011, 12:10 PM   #209
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by kovidgoyal View Post
I would suggest minimizing writes to the db. i.e. keep your lists in memory until the end of the search and only then write to the database, preferably with a single executemany call.You probably need to add another API method to database2 for that.
I even think plugins should not have info in the Calibre db.
Maybe a second db that is accessible by the Calibre API.
Reason: (other topic but related to your remark)
Spoiler:
At this moment there are just 'a few' plugins. If you get more and they store info in a db, there would be a point that a large ammount of the db is used by plugin data, no real book data.
The db gets slow by that.
Also possible left overs in db would cause problems.

At the other hand, a lot of plugins would benefit by store data into a db. Like exempted books in this plugin or for example a 'similar author' field. If calibre offers a second db (second file) this could be done without performance-loss. Corruption of db by plugins would cause the second db to be problematic, the main db would stay ok.
drMerry is offline   Reply With Quote
Old 04-29-2011, 12:12 PM   #210
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,374
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Having lots of data in a db does not make it slow, unless the db is very poorly designed. And there is nothing preventing a plugin from using its own db if it feels like it.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicate Detection Philosopher Library Management 114 09-08-2022 07:03 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM
Duplicate Detection albill Calibre 2 10-26-2010 02:21 PM
New Plugin Type Idea: Library Plugin cgranade Plugins 3 09-15-2010 12:11 PM
Help with Chapter detection ubergeeksov Calibre 0 09-02-2010 04:56 AM


All times are GMT -4. The time now is 10:58 AM.


MobileRead.com is a privately owned, operated and funded community.