View Single Post
Old 08-27-2012, 08:28 AM   #6
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,453
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Adoby View Post
But using pull, I would prefer the second variant, with local browsing on the device. Preferably even without any need for Calibre to be running at the same time. Or even without any network at all. You would then create a "wishlist" from the local database that would be automatically downloaded on the next connect to Calibre.
This is consistent with your request for a "background connect".
Quote:
When CC request metadata for the whole Calibre library, CC could include the timestamp for the most recent update. That way Calibre only have to create a package/catalogue containing metadata for books that have changed/added/removed since the last transfer of metadata.
This is exactly what we would do, except you also need to check the library UUID to be sure that the books came from the library you are connecting to.
Quote:
Instead of transferring a lot of metadata that already might be available on the other side, "packages" with pairs of bookid and hashes/checksums for the metadata for that bookid might be transferred instead. Perhaps 8 bytes per book? Then only metadata for books with different hashes have to be sent. The local database might still allow browsing of all metadata if the metadata is not up to date, but could flag entries with bad hashes as being in the process of being updated. That way the transfer of metadata might be a relatively low priority process. Could even add a refresh button that overrides the background update to quickly fetch updated metadata for just that book.
If we separate metadata transfer from metadata update, the process of sending metadata would not be significantly slower than computing and sending hashes. The process I am considering is:
  1. Calibre announces that it can send metadata
  2. CC sends UUID/timestamp for metadata-only books that came from that library
  3. Calibre compares the library with what CC sent, building a list of changes
  4. Calibre sends packages of new metadata. In my experiment this ran at around 15 books/second. We would consider putting in the background on CC if we can work out all the parallelism.
  5. Calibre sends list of deleted books. CC generates a task to do the deletions.
  6. CC stores metadata and schedules a DB Update task that runs in the background.
Quote:
The possible problems regarding performance with huge libraries could be handled by making parts of CC optional. That way owners of different devices might activate different parts of CC.
The problem with this approach is that many people ignore the warnings. If the feature exists, then some people will insist that it work for them, even on their tiny phone or with their 75,000 book library. We see this all the time in calibre where people have many thousands of books in a library on a PC with almost no memory and a slow processor, or when people put their library on slow(er) NAS and expect it to run at local-disk speeds. I can't fault the users -- they want what they want or need. The problem is that satisfying this tiny group of users takes a huge amount of time, if satisfying them is even possible. Not satisfying them leads to public complaints that chase away potential users who would not have the problems but don't know that.

I am sure that we can get the transfer performance, threading and parallelism right, eventually if not immediately. Users' expectations and anger about capacity are my biggest fears. I am strongly tempted to offer only content-server style access so that the question of capacity and performance does not arise. I don't think it would be as nice (I would much prefer option 2), but it would provide "pull" capabilities while avoiding most of the large-library problems. More reflection is needed...

Last edited by chaley; 08-27-2012 at 11:42 AM.
chaley is offline   Reply With Quote