View Single Post
Old 08-27-2012, 04:38 AM   #1
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,449
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
[Android App] Question to CC users: how to add "browse" (pull) to CC?

We are going to add "pull" to CC somehow. By "pull" I mean the ability to use CC to "pull" books to your device instead of using calibre to "push" them. The debate we are having is "how". There are two general approaches, and I would like to hear your opinions about them.

Yes, I know most of you will say "Do both". OK, that is helpful. However, some hints about priority, usefulness, usability, or other things will be appreciated.

The approaches:
  1. Add a content server interface to CC, probably an OPDS browser. This approach is straight-forward, requiring only a few changes to the content server. The major good point is that it could be used to browse other sites than your calibre library. I see some not-so-good points:
    • The browsing interface is different from CC. It will probably be paged, not showing a "flingable" result.
    • It isn't clear how we can support "on device" and "not in library" (show content already in CC and on the remote library together).
    • You probably need to open two ports on your router.
    • Browsing performance will be at network speeds, which may or may not provide a good experience.
  2. Add "browsing" as a native ability to CC. This would require adding a "metadata download" capability to CC to retrieve info about the books in the calibre library that have not been sent to the device. In this case, CC would ask calibre for the metadata for books not already downloaded. This metadata would be refreshed on connect, just as the metadata for books already downloaded is refreshed; only new or changed metadata would be resent. While connected, you could ask CC to ask calibre to send a book, which would cause calibre to "auto-start" a send book job.

    Good points with this approach:
    • A single interface for browsing. Grouping and sorting would work. We would need to add a "filtering" menu: choosing InLibrary, OnDevice, Both, or All.
    • One port, one password, etc
    Not-so-good points:
    • We will certainly run into capacity problems. There is no way that all the metadata in a 50,000 book can be downloaded. Exactly what the acceptable number will be depends on the device. For more info, see the results of my experiment below.
    • Downloading and syncing metadata take time.
As a capacity test experiment I built a prototype of option 2 and downloaded metadata for 20,000 books (6,000 authors, 10,000 tags) to my galaxy nexus.

The prototype changed the download sequence so that metadata was downloaded separate from being added to the database, requiring the "DB Update" process to start after metadata was downloaded. Using this 2-phase scheme, the metadata for the 20,000 books downloaded in around 30 minutes. After the download finished it took 10 hours for the DB upgrade to run. During that 10 hours, CC was very sluggish but "usable".

Once the DB upgrade process had finished, I saw the following performance: (NB: when I say "less than a second" I mean "noticeable delay but within my tolerance".)
  • Time to sort by anything: less than a second.
  • Time to group by title first letter: around 3 to 4 seconds
  • Time to get books with title starting with a letter: under a second
  • Time to group by authors first letter: under a second
  • Time to list authors beginning with a letter: around 3 to 4 seconds
  • Time to list books written by an author: under a second
CC is usable with these timings, but clearly it is at the edge. Some people might consider it as having fallen over the edge. I think that 20,000 books represents an upper limit for devices in the same class as my galaxy nexus.

So, the fundamental questions we face now are:
  1. Which of the two approaches would be best for a user? Does the answer change as limits are exceeded?
    • Which is easiest to use?
    • Which offers the best browsing functions?
    • Which is easiest to understand?
  2. If we build the second option, how do we deal with huge libraries?
    • Hard limit of X (what is X?) so we don't get people unhappy with performance?
    • Say "That is how it is" and ignore the problem? This approach is often unsatisfactory, because some people will complain. I can already see the 1-star reviews saying "performance sucks for my 42,000 book library." Sometimes it seems better not to offer a useful feature if it is possible to "abuse" that feature.
    • Require using a saved search that limits metadata downloaded to a reasonable X (and again, what is X?). In this case, what do we do when X is exceeded?

Some related questions:
  1. My experiment separated the metadata download from the DB update. The advantage is that the download metadata process will finish much more quickly. The disadvantage is that the DB Updater must run before the new metadata is in the DB, meaning that the process isn't really done when the progress dialog goes away. Is this something we should do when syncing metadata at startup? Which do you prefer?
  2. My experiment enabled the group menu while the DB Updater was running. This has the advantage that I can continue to use the group menu, but has the disadvantage that the grouping results are incomplete until the DB updater finishes, new metadata won't be reflected in the groups until the updater processes that metadata, and grouping operations can be very sluggish because they are fighting with the updater for the DB. Which do you prefer: no group menu until it gives complete results (what you have today) or having a group menu available that will give incomplete results and can be sluggish if the DB updater is running?

Last edited by chaley; 08-27-2012 at 04:44 AM.
chaley is offline   Reply With Quote