[Android App] Question to CC users: how to add "browse" (pull) to CC?

chaley · 08-27-2012, 04:38 AM

We are going to add "pull" to CC somehow. By "pull" I mean the ability to use CC to "pull" books to your device instead of using calibre to "push" them. The debate we are having is "how". There are two general approaches, and I would like to hear your opinions about them.

Yes, I know most of you will say "Do both". OK, that is helpful.

However, some hints about priority, usefulness, usability, or other things will be appreciated.

The approaches:

Add a content server interface to CC, probably an OPDS browser. This approach is straight-forward, requiring only a few changes to the content server. The major good point is that it could be used to browse other sites than your calibre library. I see some not-so-good points:
- The browsing interface is different from CC. It will probably be paged, not showing a "flingable" result.
- It isn't clear how we can support "on device" and "not in library" (show content already in CC and on the remote library together).
- You probably need to open two ports on your router.
- Browsing performance will be at network speeds, which may or may not provide a good experience.
Add "browsing" as a native ability to CC. This would require adding a "metadata download" capability to CC to retrieve info about the books in the calibre library that have not been sent to the device. In this case, CC would ask calibre for the metadata for books not already downloaded. This metadata would be refreshed on connect, just as the metadata for books already downloaded is refreshed; only new or changed metadata would be resent. While connected, you could ask CC to ask calibre to send a book, which would cause calibre to "auto-start" a send book job.

Good points with this approach:
- A single interface for browsing. Grouping and sorting would work. We would need to add a "filtering" menu: choosing InLibrary, OnDevice, Both, or All.
- One port, one password, etc
Not-so-good points:
- We will certainly run into capacity problems. There is no way that all the metadata in a 50,000 book can be downloaded. Exactly what the acceptable number will be depends on the device. For more info, see the results of my experiment below.
- Downloading and syncing metadata take time.

As a capacity test experiment I built a prototype of option 2 and downloaded metadata for 20,000 books (6,000 authors, 10,000 tags) to my galaxy nexus.

The prototype changed the download sequence so that metadata was downloaded separate from being added to the database, requiring the "DB Update" process to start after metadata was downloaded. Using this 2-phase scheme, the metadata for the 20,000 books downloaded in around 30 minutes. After the download finished it took 10 hours for the DB upgrade to run. During that 10 hours, CC was very sluggish but "usable".

Once the DB upgrade process had finished, I saw the following performance: (NB: when I say "less than a second" I mean "noticeable delay but within my tolerance".)

Time to sort by anything: less than a second.
Time to group by title first letter: around 3 to 4 seconds
Time to get books with title starting with a letter: under a second
Time to group by authors first letter: under a second
Time to list authors beginning with a letter: around 3 to 4 seconds
Time to list books written by an author: under a second

CC is usable with these timings, but clearly it is at the edge. Some people might consider it as having fallen over the edge. I think that 20,000 books represents an upper limit for devices in the same class as my galaxy nexus.

So, the fundamental questions we face now are:

Which of the two approaches would be best for a user? Does the answer change as limits are exceeded?
- Which is easiest to use?
- Which offers the best browsing functions?
- Which is easiest to understand?
If we build the second option, how do we deal with huge libraries?
- Hard limit of X (what is X?) so we don't get people unhappy with performance?
- Say "That is how it is" and ignore the problem? This approach is often unsatisfactory, because some people will complain. I can already see the 1-star reviews saying "performance sucks for my 42,000 book library." Sometimes it seems better not to offer a useful feature if it is possible to "abuse" that feature.
- Require using a saved search that limits metadata downloaded to a reasonable X (and again, what is X?). In this case, what do we do when X is exceeded?

Some related questions:

My experiment separated the metadata download from the DB update. The advantage is that the download metadata process will finish much more quickly. The disadvantage is that the DB Updater must run before the new metadata is in the DB, meaning that the process isn't really done when the progress dialog goes away. Is this something we should do when syncing metadata at startup? Which do you prefer?
My experiment enabled the group menu while the DB Updater was running. This has the advantage that I can continue to use the group menu, but has the disadvantage that the grouping results are incomplete until the DB updater finishes, new metadata won't be reflected in the groups until the updater processes that metadata, and grouping operations can be very sluggish because they are fighting with the updater for the DB. Which do you prefer: no group menu until it gives complete results (what you have today) or having a group menu available that will give incomplete results and can be sluggish if the DB updater is running?

Piper_ · 08-27-2012, 06:36 AM

If you use the second option, would it be possible to allow users to split their libraries into smaller chunks s and then select from more than the currently-open library?

Hope that makes sense... either way will be awesome for me. I'm just so happy y'all are so close to including this feature as I bought cc, but haven't been able to get on my PC to use it.
(I've been using the calibre server, but can only access the book s from the library I left opened, AFAICT...?)

chaley · 08-27-2012, 06:42 AM

Quote:

Originally Posted by Piper_

If you use the second option, would it be possible to allow users to split their libraries into smaller chunks and then select from more than the currently-open library?

Yes, but it won't help. The problem is the number of "books" (with or without the actual book content) stored in CC, not the number of books in the one-or-more of calibre libraries. In option 2, when CC syncs with a library it will (of course, optionally) download the metadata for the books in that library. Sync with another library, and you will get the books from that one. They aggregate.

Ahhh ... I just realized the subquestion. You are asking "Can I change libraries from CC?". I have never thought of that, but it could be possible. However, this would be very weird for some poor soul using that instance of calibre at that moment, because it would switch in the GUI.

Piper_ · 08-27-2012, 06:51 AM

Lol Gotcha.

thanks, Chaley.

Adoby · 08-27-2012, 07:19 AM

Most likely I would rarely use the pull feature, I would prefer to push out books from Calibre to the device(s). I have all of my (good) books on my tablet already.

But using pull, I would prefer the second variant, with local browsing on the device. Preferably even without any need for Calibre to be running at the same time. Or even without any network at all. You would then create a "wishlist" from the local database that would be automatically downloaded on the next connect to Calibre.

When CC request metadata for the whole Calibre library, CC could include the timestamp for the most recent update. That way Calibre only have to create a package/catalogue containing metadata for books that have changed/added/removed since the last transfer of metadata.

Instead of transferring a lot of metadata that already might be available on the other side, "packages" with pairs of bookid and hashes/checksums for the metadata for that bookid might be transferred instead. Perhaps 8 bytes per book? Then only metadata for books with different hashes have to be sent. The local database might still allow browsing of all metadata if the metadata is not up to date, but could flag entries with bad hashes as being in the process of being updated. That way the transfer of metadata might be a relatively low priority process. Could even add a refresh button that overrides the background update to quickly fetch updated metadata for just that book.

Error handling would be fun...

The possible problems regarding performance with huge libraries could be handled by making parts of CC optional. That way owners of different devices might activate different parts of CC.

Basic functionality using push from calibre should still be available to all, the pull feature could be optional, depending on how powerful device you have, and how much storage you can spare.

Users with weak devices might then create a Calibre library with a small subset of their books to allow them to use the pull features of CC.

When activating some features a notice could be shown warning about possible problems when very large libraries, slow connections or devices with low processing capabilities are used.

chaley · 08-27-2012, 08:28 AM

Quote:

Originally Posted by Adoby

But using pull, I would prefer the second variant, with local browsing on the device. Preferably even without any need for Calibre to be running at the same time. Or even without any network at all. You would then create a "wishlist" from the local database that would be automatically downloaded on the next connect to Calibre.

This is consistent with your request for a "background connect".

Quote:

When CC request metadata for the whole Calibre library, CC could include the timestamp for the most recent update. That way Calibre only have to create a package/catalogue containing metadata for books that have changed/added/removed since the last transfer of metadata.

This is exactly what we would do, except you also need to check the library UUID to be sure that the books came from the library you are connecting to.

Quote:

Instead of transferring a lot of metadata that already might be available on the other side, "packages" with pairs of bookid and hashes/checksums for the metadata for that bookid might be transferred instead. Perhaps 8 bytes per book? Then only metadata for books with different hashes have to be sent. The local database might still allow browsing of all metadata if the metadata is not up to date, but could flag entries with bad hashes as being in the process of being updated. That way the transfer of metadata might be a relatively low priority process. Could even add a refresh button that overrides the background update to quickly fetch updated metadata for just that book.

If we separate metadata transfer from metadata update, the process of sending metadata would not be significantly slower than computing and sending hashes. The process I am considering is:

Calibre announces that it can send metadata
CC sends UUID/timestamp for metadata-only books that came from that library
Calibre compares the library with what CC sent, building a list of changes
Calibre sends packages of new metadata. In my experiment this ran at around 15 books/second. We would consider putting in the background on CC if we can work out all the parallelism.
Calibre sends list of deleted books. CC generates a task to do the deletions.
CC stores metadata and schedules a DB Update task that runs in the background.

Quote:

The possible problems regarding performance with huge libraries could be handled by making parts of CC optional. That way owners of different devices might activate different parts of CC.

The problem with this approach is that many people ignore the warnings. If the feature exists, then some people will insist that it work for them, even on their tiny phone or with their 75,000 book library. We see this all the time in calibre where people have many thousands of books in a library on a PC with almost no memory and a slow processor, or when people put their library on slow(er) NAS and expect it to run at local-disk speeds. I can't fault the users -- they want what they want or need. The problem is that satisfying this tiny group of users takes a huge amount of time, if satisfying them is even possible. Not satisfying them leads to public complaints that chase away potential users who would not have the problems but don't know that.

I am sure that we can get the transfer performance, threading and parallelism right, eventually if not immediately. Users' expectations and anger about capacity are my biggest fears. I am strongly tempted to offer only content-server style access so that the question of capacity and performance does not arise. I don't think it would be as nice (I would much prefer option 2), but it would provide "pull" capabilities while avoiding most of the large-library problems. More reflection is needed...

Adoby · 08-27-2012, 11:38 AM

Hashes actually never have to be computed on the device. They can be computed by Calibre and stored with the metadata. It doesn't even have to be hashes, a number that is incremented for each update of a book or metadata would do, as long as each book and set of metadata can be matched between devices. But as you say, it may not provide much benefit, if any at all.

The background connect I've mentioned before could be extended to also be used to keep the databases in sync, using low priority threads, but perhaps only when the device is charging and Calibre is active and option for background sync is activated.

To defuse anger about "poor performance" you could split CC into separate apps, that cooperate and share data.

1. Basic connection and "send to device from Calibre", along with an eBook browser and launcher. Like current version. (But with background receive mode added.

)

2. Addon to connect online to content server on Calibre and download books. The first of your options above.

3. Addon to allow remote offline Calibre browsing using a local copy of Calibre database and "queue fetch to device from Calibre". The second option above.

Charge for the first, and give the two others away for free, with a warning that they are only intended for users with adequate hardware and may not work as intended otherwise. That way you can perhaps minimize the anger from frustrated users?

Most likely it will be quicker to implement the online access, if I understand you correctly. Offline access may wait, until you see if there is a demand and you think it will increase sales. I doubt it, unfortunately, I suspect most users would be happy with online access or the basic functionality.

edheil · 08-27-2012, 12:31 PM

For me personally, with a reasonable-sized library (only about 200 books right now), I would like option 2.

Either would be acceptable and useful, however. I'll leave it to people with huge libraries to say how they would like their situations to be handled.

ellett · 08-27-2012, 01:01 PM

For me, it's option 2. It would be all about pulling down the books that I had added to calibre that weren't already transferred.

I think I'll eventually end up with maybe 10,000 books, so the 30 minute+ timings are a concern. I think what I'd like is to be able to do a download of the complete calibre metadata on command, separately from "normal" CC startup. I could fire that up when I went to bed or otherwise had a chunk of time I wasn't going to be using CC and wouldn't care how long it took. Maybe a second CC start icon on Android that did the complete transfer, leaving the "normal" icon to do what it does now.

"Preparing your books" for my 2200+ books is now a momentary operation and I'd love to keep it that way, while transferring books is actually quicker wirelessly with CC than it is using USB connect and lettered disk drive on my Nook Simple Touch. If pull would significantly increase CC response time, I'd rather do without and continue to push.

The way I do things, I keep all my books on all devices, using calibre to "check-in" new arrivals and clean up the metadata so they sort correctly, have complete and accurate series information etc. Unless I get a new device or a major boo-boo wipes out my CC, I won't be pushing or pulling thousands, or even hundreds of books.

Dopedangel · 08-27-2012, 01:16 PM

I would like the option of seeing and downloading the last few books added to calibre. I dont have a large library just few hundred and I keep them all on my devices. So the ability to download newly added books would be great.

GoghGirl · 08-28-2012, 02:12 AM

I vote for option 2. Native seems to have more features.

Quote:

We will certainly run into capacity problems. There is no way that all the metadata in a 50,000 book can be downloaded.

Is it possible to break it into chunks? I don't know how the process works so my idea might not be at all valid/make any sense..

For instance if I am looking for a book I go to look at my tabs. Would it be possible to only download the metadata for books under a certain tag? For those with large libraries they could break it into 'Library A', 'Library B', etc so you could customize it based on your connection speed.

Or would it be possible to separate the authors by last name into chunks? All authors A-C is a chunk etc?

I most often choose new books to read based on tags so any feature that will display all of the books associated under that tag would be great!

Thankyou for all of the updates concerning nooks! It works so much smoother now!The rate of your updates is amazing!

08-27-2012, 04:38 AM	#1
chaley Grand Sorcerer Posts: 11,732 Karma: 6690881 Join Date: Jan 2010 Location: Notts, England Device: Kobo Libra 2	[Android App] Question to CC users: how to add "browse" (pull) to CC? We are going to add "pull" to CC somehow. By "pull" I mean the ability to use CC to "pull" books to your device instead of using calibre to "push" them. The debate we are having is "how". There are two general approaches, and I would like to hear your opinions about them. Yes, I know most of you will say "Do both". OK, that is helpful. However, some hints about priority, usefulness, usability, or other things will be appreciated. The approaches: Add a content server interface to CC, probably an OPDS browser. This approach is straight-forward, requiring only a few changes to the content server. The major good point is that it could be used to browse other sites than your calibre library. I see some not-so-good points: The browsing interface is different from CC. It will probably be paged, not showing a "flingable" result. It isn't clear how we can support "on device" and "not in library" (show content already in CC and on the remote library together). You probably need to open two ports on your router. Browsing performance will be at network speeds, which may or may not provide a good experience. Add "browsing" as a native ability to CC. This would require adding a "metadata download" capability to CC to retrieve info about the books in the calibre library that have not been sent to the device. In this case, CC would ask calibre for the metadata for books not already downloaded. This metadata would be refreshed on connect, just as the metadata for books already downloaded is refreshed; only new or changed metadata would be resent. While connected, you could ask CC to ask calibre to send a book, which would cause calibre to "auto-start" a send book job. Good points with this approach: A single interface for browsing. Grouping and sorting would work. We would need to add a "filtering" menu: choosing InLibrary, OnDevice, Both, or All. One port, one password, etc Not-so-good points: We will certainly run into capacity problems. There is no way that all the metadata in a 50,000 book can be downloaded. Exactly what the acceptable number will be depends on the device. For more info, see the results of my experiment below. Downloading and syncing metadata take time. As a capacity test experiment I built a prototype of option 2 and downloaded metadata for 20,000 books (6,000 authors, 10,000 tags) to my galaxy nexus. The prototype changed the download sequence so that metadata was downloaded separate from being added to the database, requiring the "DB Update" process to start after metadata was downloaded. Using this 2-phase scheme, the metadata for the 20,000 books downloaded in around 30 minutes. After the download finished it took 10 hours for the DB upgrade to run. During that 10 hours, CC was very sluggish but "usable". Once the DB upgrade process had finished, I saw the following performance: (NB: when I say "less than a second" I mean "noticeable delay but within my tolerance".) Time to sort by anything: less than a second. Time to group by title first letter: around 3 to 4 seconds Time to get books with title starting with a letter: under a second Time to group by authors first letter: under a second Time to list authors beginning with a letter: around 3 to 4 seconds Time to list books written by an author: under a second CC is usable with these timings, but clearly it is at the edge. Some people might consider it as having fallen over the edge. I think that 20,000 books represents an upper limit for devices in the same class as my galaxy nexus. So, the fundamental questions we face now are: Which of the two approaches would be best for a user? Does the answer change as limits are exceeded? Which is easiest to use? Which offers the best browsing functions? Which is easiest to understand? If we build the second option, how do we deal with huge libraries? Hard limit of X (what is X?) so we don't get people unhappy with performance? Say "That is how it is" and ignore the problem? This approach is often unsatisfactory, because some people will complain. I can already see the 1-star reviews saying "performance sucks for my 42,000 book library." Sometimes it seems better not to offer a useful feature if it is possible to "abuse" that feature. Require using a saved search that limits metadata downloaded to a reasonable X (and again, what is X?). In this case, what do we do when X is exceeded? Some related questions: My experiment separated the metadata download from the DB update. The advantage is that the download metadata process will finish much more quickly. The disadvantage is that the DB Updater must run before the new metadata is in the DB, meaning that the process isn't really done when the progress dialog goes away. Is this something we should do when syncing metadata at startup? Which do you prefer? My experiment enabled the group menu while the DB Updater was running. This has the advantage that I can continue to use the group menu, but has the disadvantage that the grouping results are incomplete until the DB updater finishes, new metadata won't be reflected in the groups until the updater processes that metadata, and grouping operations can be very sluggish because they are fighting with the updater for the DB. Which do you prefer: no group menu until it gives complete results (what you have today) or having a group menu available that will give incomplete results and can be sluggish if the DB updater is running? Last edited by chaley; 08-27-2012 at 04:44 AM.

08-27-2012, 07:19 AM	#5
Adoby Handy Elephant Posts: 1,736 Karma: 26785668 Join Date: Dec 2009 Location: Southern Sweden, far out in the quiet woods Device: Thinkpad E595, Ubuntu Mate, Huawei Mediapad 5, Bouye Likebook Plus	Most likely I would rarely use the pull feature, I would prefer to push out books from Calibre to the device(s). I have all of my (good) books on my tablet already. But using pull, I would prefer the second variant, with local browsing on the device. Preferably even without any need for Calibre to be running at the same time. Or even without any network at all. You would then create a "wishlist" from the local database that would be automatically downloaded on the next connect to Calibre. When CC request metadata for the whole Calibre library, CC could include the timestamp for the most recent update. That way Calibre only have to create a package/catalogue containing metadata for books that have changed/added/removed since the last transfer of metadata. Instead of transferring a lot of metadata that already might be available on the other side, "packages" with pairs of bookid and hashes/checksums for the metadata for that bookid might be transferred instead. Perhaps 8 bytes per book? Then only metadata for books with different hashes have to be sent. The local database might still allow browsing of all metadata if the metadata is not up to date, but could flag entries with bad hashes as being in the process of being updated. That way the transfer of metadata might be a relatively low priority process. Could even add a refresh button that overrides the background update to quickly fetch updated metadata for just that book. Error handling would be fun... The possible problems regarding performance with huge libraries could be handled by making parts of CC optional. That way owners of different devices might activate different parts of CC. Basic functionality using push from calibre should still be available to all, the pull feature could be optional, depending on how powerful device you have, and how much storage you can spare. Users with weak devices might then create a Calibre library with a small subset of their books to allow them to use the pull features of CC. When activating some features a notice could be shown warning about possible problems when very large libraries, slow connections or devices with low processing capabilities are used. Last edited by Adoby; 08-27-2012 at 07:31 AM. Reason: Fixxed som speling erors

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Cache question re: Calibre Library Android app	martyh	Devices	2	09-07-2012 12:55 AM
[Android App] Beta version: Android app to connect as a device to calibre using wifi	chaley	Devices	116	08-27-2012 12:42 PM
Add book from Calibre to Kindle App for Android	MarkLFT	Devices	2	07-06-2012 12:20 PM
Sugarsync Android app question	Statch	Android Devices	13	10-28-2011 05:40 PM

08-27-2012, 06:36 AM	#2
Piper_ ~~~~~ Posts: 761 Karma: 1278391 Join Date: Aug 2010 Location: USA Device: Kindle 3, Sony 350	If you use the second option, would it be possible to allow users to split their libraries into smaller chunks s and then select from more than the currently-open library? Hope that makes sense... either way will be awesome for me. I'm just so happy y'all are so close to including this feature as I bought cc, but haven't been able to get on my PC to use it. (I've been using the calibre server, but can only access the book s from the library I left opened, AFAICT...?)

08-27-2012, 06:51 AM	#4
Piper_ ~~~~~ Posts: 761 Karma: 1278391 Join Date: Aug 2010 Location: USA Device: Kindle 3, Sony 350	Lol Gotcha. thanks, Chaley.

08-27-2012, 11:38 AM	#7
Adoby Handy Elephant Posts: 1,736 Karma: 26785668 Join Date: Dec 2009 Location: Southern Sweden, far out in the quiet woods Device: Thinkpad E595, Ubuntu Mate, Huawei Mediapad 5, Bouye Likebook Plus	Hashes actually never have to be computed on the device. They can be computed by Calibre and stored with the metadata. It doesn't even have to be hashes, a number that is incremented for each update of a book or metadata would do, as long as each book and set of metadata can be matched between devices. But as you say, it may not provide much benefit, if any at all. The background connect I've mentioned before could be extended to also be used to keep the databases in sync, using low priority threads, but perhaps only when the device is charging and Calibre is active and option for background sync is activated. To defuse anger about "poor performance" you could split CC into separate apps, that cooperate and share data. 1. Basic connection and "send to device from Calibre", along with an eBook browser and launcher. Like current version. (But with background receive mode added. ) 2. Addon to connect online to content server on Calibre and download books. The first of your options above. 3. Addon to allow remote offline Calibre browsing using a local copy of Calibre database and "queue fetch to device from Calibre". The second option above. Charge for the first, and give the two others away for free, with a warning that they are only intended for users with adequate hardware and may not work as intended otherwise. That way you can perhaps minimize the anger from frustrated users? Most likely it will be quicker to implement the online access, if I understand you correctly. Offline access may wait, until you see if there is a demand and you think it will increase sales. I doubt it, unfortunately, I suspect most users would be happy with online access or the basic functionality.

08-27-2012, 12:31 PM	#8
edheil Junior Member Posts: 7 Karma: 10 Join Date: Aug 2012 Device: Nook Tablet running Cyanogenmod 7	For me personally, with a reasonable-sized library (only about 200 books right now), I would like option 2. Either would be acceptable and useful, however. I'll leave it to people with huge libraries to say how they would like their situations to be handled.

08-27-2012, 01:01 PM	#9
ellett Connoisseur Posts: 50 Karma: 10 Join Date: Feb 2011 Device: Android	For me, it's option 2. It would be all about pulling down the books that I had added to calibre that weren't already transferred. I think I'll eventually end up with maybe 10,000 books, so the 30 minute+ timings are a concern. I think what I'd like is to be able to do a download of the complete calibre metadata on command, separately from "normal" CC startup. I could fire that up when I went to bed or otherwise had a chunk of time I wasn't going to be using CC and wouldn't care how long it took. Maybe a second CC start icon on Android that did the complete transfer, leaving the "normal" icon to do what it does now. "Preparing your books" for my 2200+ books is now a momentary operation and I'd love to keep it that way, while transferring books is actually quicker wirelessly with CC than it is using USB connect and lettered disk drive on my Nook Simple Touch. If pull would significantly increase CC response time, I'd rather do without and continue to push. The way I do things, I keep all my books on all devices, using calibre to "check-in" new arrivals and clean up the metadata so they sort correctly, have complete and accurate series information etc. Unless I get a new device or a major boo-boo wipes out my CC, I won't be pushing or pulling thousands, or even hundreds of books.

08-27-2012, 01:16 PM	#10
Dopedangel Wizard Posts: 1,759 Karma: 30063305 Join Date: Dec 2006 Location: Singapore Device: Boyue	I would like the option of seeing and downloading the last few books added to calibre. I dont have a large library just few hundred and I keep them all on my devices. So the ability to download newly added books would be great.