Suggestions for the CC cloud connection - Page 2

kaufman · 03-16-2016, 10:30 PM

I think your problem is that you have a pretty specific use case for what you want that it doesn't look like is generally useful. So it would be a lot of effort to implement for just one person's use.

chaley · 03-17-2016, 03:41 AM

Quote:

Originally Posted by DavidTC

I am not sure that you have internalized that books in libraries synchronized by a cloud provider are almost certainly not "calibre books". Their internal metadata is not updated.

I do not understand what you mean by 'Their internal metadata is not updated.'. Any book that has been in Calibre has its internal metadata 'updated' at least once, when it was added. (Assuming Calibre supports the metadata, but, if not, all this is moot.) When it was added, at minimum, a UUID was put in, allowing an easy re-match to that book if discovered elsewhere.

No. Calibre does not update metadata in a book when that book is added to the calibre library. Nor does calibre update metadata in a book when the book's metadata is edited. So the usual case is that the book in calibre's library has the metadata it came with.

Quote:

That is what I meant when I said 'I would stop there'....if that UUID doesn't exist, the file has never been in Calibre. (A UUID that doesn't match anything also shouldn't be farther matched, especially since a likely setup is that it's from *another* Calibre library the user has.)

You are ignoring the use case where a person has multiple calibre libraries, perhaps on different computers, perhaps on the same computer used by different family members. These libraries are not synced with each other. Such a case was brought up a week or two ago in our beta group when discussing the advisability of using calibre db ids in default file name templates. In this case a book's UUID will be different depending on which library one connects to. The situation can arise as well when a person has "staging libraries", a common technique used to ensure that the metadata in the calibre library for a new book is correct.

Quote:

Anyway, stepping back from book matching a second, my point is that books can, indeed, have their correct metadata, and it is possible to know when that is.

The keyword is "can". The reality: there is no guarantee that a UUID is there or that if there it is the "right" one.

Quote:

It is also worth noting that many (most?) pirated book collections are generated using calibre. Sometimes the books *do* contain calibre metadata, but it is metadata for the pirate's library not the user's library.

Heh. I literally deleted a section on pirates in my last two posts addressing exactly that.

To start with, this situation is hard to get to. I suspect that pirates use 'Save to Disk' instead of 'Send to Device', and that's why I *didn't* suggest having 'Save to Disk' add that flag, despite that metadata being correct also.

In calibre, Save to Disk and Send to Device use the same code to update metadata in the book. Also, I have seen indications that some libraries are copies of the calibre library with the metadata.*files deleted. Thus the book can a) have no calibre metadata, b) metadata from where the pirate pirated the book, c) the pirate's metadata, or d) the final user's metadata.

---

I am considering adding to CC the (explicit) ability to identify all books in a cloud library but not on the device and add those books to CC. That is a use case that is both supportable (non probabilistic) and has been requested by users other than you. The "update my books" case is handled by the wireless device connection.

Note that you can already "fetch all books not on device" in CC's cloud connection by tapping "Newest" and then "Download All". CC queues all the books, skips books that are already on the device (a book has a matching UUID), and complains if a book doesn't have a usable format. This method has two problems. 1) It isn't obvious that it can be used for this purpose. I don't think you have twigged to it and I didn't remember to mention it. 2) It queues all the books even if the books cannot be downloaded (no acceptable format) or are already downloaded. In the second case the processing to determine whether a download is necessary happens while the queue is being processed.

To fix both 1 and 2 I need both to make the operation explicit and to do the processing in advance to determine which books to download. The best way to do this is some set math that I described earlier. I need to verify that the set math will run on a phone when the user has a huge library (at least 20,000 books). If it does then CC already contains most of the rest of what is needed, specifically the cloud download queue.

DavidTC · 03-21-2016, 05:10 PM

Note that you can already "fetch all books not on device" in CC's cloud connection by tapping "Newest" and then "Download All". CC queues all the books, skips books that are already on the device (a book has a matching UUID), and complains if a book doesn't have a usable format. This method has two problems. 1) It isn't obvious that it can be used for this purpose. I don't think you have twigged to it and I didn't remember to mention it. 2) It queues all the books even if the books cannot be downloaded (no acceptable format) or are already downloaded. In the second case the processing to determine whether a download is necessary happens while the queue is being processed.

Oh, I've discovered that, and also that you can filter books to just be the ones not on the device, and download just those. (I don't have any books not in acceptable format.)

The problem *there* is, as I said back where I started with all this, my original suggestion #1: This requires keeping two copies of everything on the device, which is why I asked for the ability to 'download' a book by just pointing the record at the existing file, instead of making a copy. This method also, from what I understand now, won't correctly get calculated fields.

I fear this converstation is getting oddly bogged down in weird details, including talking about a lot of things I was wrong about or didn't actually know,and also bogged down in my thoughts about how CC is somewhat too liberal in the files it accepts, which isn't relevant to anything.

So let me summarize what I am thinking *now* about adding books from the cloud:

1) Currently, CC ends up having to use a mix of metadata.db and internal book metadata, and can *still* get it wrong when getting it from a cloud connection. (Right? If I understand correctly, calculated fields are only in the book metadata, so won't be there usually.)

2) The internal book metadata *is* correct (For books that Calibre supports) after Send to Device books.

3) Also CC does needs metadata.db to exist (Even if it is incomplete in some fields), so it can download that to show a listing of books from the cloud.

4) But there is also a metadata 'database' that follows 'Send to Device' folders around, called metadata.calibre. This appears to be...not actually a database, but instead a text file with all that information. In fact, looking at it, it has information that even metadata.db doesn't have, including calculated fields! (Which also nicely solves the problem of non-supported metadata files!)

So let me pause here and make a suggestion:

Perhaps the problem with cloud libraries is that CC is only supporting *the wrong type of cloud*. CC supports putting the entire library up there *as is*, with wrong metadata and everything.

However, it is almost as easy for people to put books in the cloud using 'Send to Device' folder in the cloud concept.

This would require CC to read metadata.calibre to show books, in addition to metadata.db.

And now two obvious objections arise: Not only is metadata.calibre a new format for CC to support, it's *huge*. My library is ~5000, and my metadata.db is 25 megs. But when putting those books in a folder device, metadata.calibre is 124 megs! Which causes obvious problems if the intent is to download it from the cloud every time someone wants a book.

There's an easy solution to the size: That file should be zlib compressed. When I compressed that file using the fastest compression level, it was only 12 megs. Considering the speed of the compression (4 seconds for me), vs the likely speed difference transferring of 12 megs vs. 124 megs over a USB connection, it should be doing that anyway! (Not sure if any third party software uses metadata.calibre, but if so, an option to additionally generating the uncompressed file would solve that.)

But the first point still holds, and is work. This would requires writing code in CC to *read* the format of metadata.calibre. (It appears to be json?)

I am not sure of the level of required work with *that*. But it seems like, in addition to fixing the metadata, it would solve a lot of problems that cloud connections currently have to work around.

It stops the 'How do I do more than one library?' questions. It allows multiple libraries, and even multiple *computers*, to put files there, as long people can point Calibre at that synced-to-cloud directory. It allows people to put only certain books there, instead of their entire library.

Some people, of course, will keep using the old way, either because they don't know how to work Reading List and they want their whole library, or they're lazy. And that's fine. They just risk bad metadata for calculated fields, exactly as already is true

To continue:

4) It would be best if CC could support plugging the device into Calibre and have files sent that way.

5) Likewise, the current 'cloud connection' results in duplicate files when the 'cloud' is local.

Second suggestion:

If CC can parse metadata.calibre, couldn't it just notice (Either automatically or manually) a new or updated metadata.calibe.gz in its *own* folder, and adding files it doesn't have which are listed in that, using the paths that are in that?

This metadata.calibre.gz could come from either a synced Send-to-Device folder, or from plugging the device into a computer.

And this also short-circuits any problem of marking such books in the metadata, like I was talking about above. The only hypothetical problematic situation here is that pirate are insanely producing pirated book folders with calibre.metadata.gz files in them and people are copying those entire folders straight to devices without using Calibre at all. Which is so obviously silly and delibarate it's not worth worry about.

chaley · 03-22-2016, 06:29 AM

Quote:

Originally Posted by DavidTC

The problem *there* is, as I said back where I started with all this, my original suggestion #1: This requires keeping two copies of everything on the device, which is why I asked for the ability to 'download' a book by just pointing the record at the existing file, instead of making a copy. This method also, from what I understand now, won't correctly get calculated fields.

Bottom line: I am not going to change CC to use book files in a local cloud library. It is complicated, could easily break several internal assumptions, and isn't much in demand.

Quote:

So let me summarize what I am thinking *now* about adding books from the cloud:

1) Currently, CC ends up having to use a mix of metadata.db and internal book metadata, and can *still* get it wrong when getting it from a cloud connection. (Right? If I understand correctly, calculated fields are only in the book metadata, so won't be there usually.)

No. CC does not use internal book metadata unless you use "scan on connect" with the wireless device. The cloud connection never uses internal metadata.

Quote:

2) The internal book metadata *is* correct (For books that Calibre supports) after Send to Device books.

And only for book formats that support arbitrary metadata, such as epub. Mobi does not.

Quote:

3) Also CC does needs metadata.db to exist (Even if it is incomplete in some fields), so it can download that to show a listing of books from the cloud.

And get the metadata for the books.

Quote:

4) But there is also a metadata 'database' that follows 'Send to Device' folders around, called metadata.calibre. This appears to be...not actually a database, but instead a text file with all that information. In fact, looking at it, it has information that even metadata.db doesn't have, including calculated fields! (Which also nicely solves the problem of non-supported metadata files!)

So let me pause here and make a suggestion:

Perhaps the problem with cloud libraries is that CC is only supporting *the wrong type of cloud*. CC supports putting the entire library up there *as is*, with wrong metadata and everything.

However, it is almost as easy for people to put books in the cloud using 'Send to Device' folder in the cloud concept.

This would require CC to read metadata.calibre to show books, in addition to metadata.db.

[...]

Bearing in mind that I am not going to change how CC stores books ...

I am not sure what problem your new sync type solves. As far as I can tell it adds complexity without adding significant benefit. The only thing it has over the standard cloud connection is that is has values for composite columns (computed columns). But to get these values the user must manually export their library (making a copy of all books), sync that exported copy with the device, then import that copy into CC. If you don't use composite columns then the existing cloud connection does all of that with no data loss and without the fuss. The wireless device connection does it all, albeit on a device-by-device basis.

In addition, you appear to be solving a problem that almost no one has.

Most people don't keep their entire library on their device, or if they do then they have small libraries. I am now getting numbers. At the moment:
- 1-20 books: 23%
- 21-50: 28%
- 51-100: 7%
- 101-250: 2%
- 251-500: 11%
- 501-1000: 11%
- 1001-2500: 16%
- 2501-: 2%
We see that 60% have library sizes of 250 books or less.
Most people don't use the cloud connection. At the moment:
- Wireless device: 52%
  Content server: 24%
  Cloud connection: 24% (Dropbox: 12%, Google Drive: 8%, Microsoft Onedrive: 4%)
Most users have only one device. According to the stats, each user has 1.15 devices.
Anecdotal but I think true: most people don't use composite columns (computed columns), so even for people who do use the cloud connection the lack of their value in the cloud database doesn't affect them.

On the other hand, several people have asked that we support importing books sent over a cabled connection, which is a use case similar to what you are describing. I have had this on my "look at" list for some time. Perhaps the right time is approaching.

nqk · 03-22-2016, 11:31 AM

The survey result so far lets you know how CC is used, not why it is used so. To me it is not a surprise that CC users are using more wireless connection than cloud connection, partly because how the latter is currently designed and wireless connection is the only way to have full updated metadata for downloaded books.

DavidTC · 03-22-2016, 05:11 PM

The only thing it has over the standard cloud connection is that is has values for composite columns (computed columns). But to get these values the user must manually export their library (making a copy of all books),

Manually export? Well, if setting up Calibre to connect to a folder device on startup, and configuring the Reading List plugin to automatically send all their books to there is 'manual', okay, it's manual?

sync that exported copy with the device, then import that copy into CC.

I was also suggesting using it over the cloud. I.e., CC checks for a metadata.db *or* metadata.calibre file upon cloud connection. (Either using the real cloud connection, or a fake local cloud.)

On the other hand, several people have asked that we support importing books sent over a cabled connection, which is a use case similar to what you are describing. I have had this on my "look at" list for some time. Perhaps the right time is approaching.

You mean, reading new or changed metadata.calibre in the existing CC Default folder, and picking up new books from it? (And *possible* picking up metadata changes in other books, but maybe not?)

That would be great. I'd actually much rather have CC do that than deal with any 'cloud connection' stuff.

People who use USB connections and point CC at the same directory would have that working for them.

And it also allows people to set up a cloud 'Folder device' in Calibre and sync *that* to CC's Default folder. (Which could even, awesomely, be synced back, unlike my fear of syncing the actual Calibre library back. Calibre notices, upon device connection, when a device has added books that aren't in metadata.calibre, or when books are deleted.)

Incidentally, any thoughts on compressing metadata.calibre? It's really really large per book, and very compressible. Completely outside of any of this, reading and writing that compressed would almost certainly be faster, even if it was just over USB. Or should I take that idea to some other forum?

03-22-2016, 11:31 AM	#20
nqk Fanatic Posts: 564 Karma: 32228 Join Date: Feb 2012 Device: Onyx Boox Leaf	The survey result so far lets you know how CC is used, not why it is used so. To me it is not a surprise that CC users are using more wireless connection than cloud connection, partly because how the latter is currently designed and wireless connection is the only way to have full updated metadata for downloaded books. Last edited by nqk; 03-22-2016 at 11:51 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Suggestions on pdf reader that export annotations and syn with the cloud	mendesitba	Android Devices	0	12-08-2015 01:23 AM
Connection issue: content server works / wireless connection doesn't	yunadwb	Calibre Companion	4	07-18-2015 02:49 PM
Amazon folds Kindle cloud storage into cloud drive	fjtorres	News	4	04-17-2014 04:50 AM
Amazon Announces Cloud Player and Cloud Drive	kjk	News	152	04-20-2011 06:28 AM
Wireless internet connection frustrating IDS connection	Socrates	iRex	8	10-21-2009 12:46 PM

03-16-2016, 10:30 PM	#16
kaufman Calibre Companion Fanatic Posts: 873 Karma: 1088610 Join Date: Nov 2006 Device: Galaxy Note 4, Kindle Voyage	I think your problem is that you have a pretty specific use case for what you want that it doesn't look like is generally useful. So it would be a lot of effort to implement for just one person's use.

03-21-2016, 05:10 PM	#18
DavidTC Connoisseur Posts: 77 Karma: 10 Join Date: Sep 2011 Device: Nook, Boox C67ML	Note that you can already "fetch all books not on device" in CC's cloud connection by tapping "Newest" and then "Download All". CC queues all the books, skips books that are already on the device (a book has a matching UUID), and complains if a book doesn't have a usable format. This method has two problems. 1) It isn't obvious that it can be used for this purpose. I don't think you have twigged to it and I didn't remember to mention it. 2) It queues all the books even if the books cannot be downloaded (no acceptable format) or are already downloaded. In the second case the processing to determine whether a download is necessary happens while the queue is being processed. Oh, I've discovered that, and also that you can filter books to just be the ones not on the device, and download just those. (I don't have any books not in acceptable format.) The problem there is, as I said back where I started with all this, my original suggestion #1: This requires keeping two copies of everything on the device, which is why I asked for the ability to 'download' a book by just pointing the record at the existing file, instead of making a copy. This method also, from what I understand now, won't correctly get calculated fields. I fear this converstation is getting oddly bogged down in weird details, including talking about a lot of things I was wrong about or didn't actually know,and also bogged down in my thoughts about how CC is somewhat too liberal in the files it accepts, which isn't relevant to anything. So let me summarize what I am thinking now about adding books from the cloud: 1) Currently, CC ends up having to use a mix of metadata.db and internal book metadata, and can still get it wrong when getting it from a cloud connection. (Right? If I understand correctly, calculated fields are only in the book metadata, so won't be there usually.) 2) The internal book metadata is correct (For books that Calibre supports) after Send to Device books. 3) Also CC does needs metadata.db to exist (Even if it is incomplete in some fields), so it can download that to show a listing of books from the cloud. 4) But there is also a metadata 'database' that follows 'Send to Device' folders around, called metadata.calibre. This appears to be...not actually a database, but instead a text file with all that information. In fact, looking at it, it has information that even metadata.db doesn't have, including calculated fields! (Which also nicely solves the problem of non-supported metadata files!) So let me pause here and make a suggestion: Perhaps the problem with cloud libraries is that CC is only supporting the wrong type of cloud. CC supports putting the entire library up there as is, with wrong metadata and everything. However, it is almost as easy for people to put books in the cloud using 'Send to Device' folder in the cloud concept. This would require CC to read metadata.calibre to show books, in addition to metadata.db. And now two obvious objections arise: Not only is metadata.calibre a new format for CC to support, it's huge. My library is ~5000, and my metadata.db is 25 megs. But when putting those books in a folder device, metadata.calibre is 124 megs! Which causes obvious problems if the intent is to download it from the cloud every time someone wants a book. There's an easy solution to the size: That file should be zlib compressed. When I compressed that file using the fastest compression level, it was only 12 megs. Considering the speed of the compression (4 seconds for me), vs the likely speed difference transferring of 12 megs vs. 124 megs over a USB connection, it should be doing that anyway! (Not sure if any third party software uses metadata.calibre, but if so, an option to additionally generating the uncompressed file would solve that.) But the first point still holds, and is work. This would requires writing code in CC to read the format of metadata.calibre. (It appears to be json?) I am not sure of the level of required work with that. But it seems like, in addition to fixing the metadata, it would solve a lot of problems that cloud connections currently have to work around. It stops the 'How do I do more than one library?' questions. It allows multiple libraries, and even multiple computers, to put files there, as long people can point Calibre at that synced-to-cloud directory. It allows people to put only certain books there, instead of their entire library. Some people, of course, will keep using the old way, either because they don't know how to work Reading List and they want their whole library, or they're lazy. And that's fine. They just risk bad metadata for calculated fields, exactly as already is true To continue: 4) It would be best if CC could support plugging the device into Calibre and have files sent that way. 5) Likewise, the current 'cloud connection' results in duplicate files when the 'cloud' is local. Second suggestion: If CC can parse metadata.calibre, couldn't it just notice (Either automatically or manually) a new or updated metadata.calibe.gz in its own folder, and adding files it doesn't have which are listed in that, using the paths that are in that? This metadata.calibre.gz could come from either a synced Send-to-Device folder, or from plugging the device into a computer. And this also short-circuits any problem of marking such books in the metadata, like I was talking about above. The only hypothetical problematic situation here is that pirate are insanely producing pirated book folders with calibre.metadata.gz files in them and people are copying those entire folders straight to devices without using Calibre at all. Which is so obviously silly and delibarate it's not worth worry about.

03-22-2016, 05:11 PM	#21
DavidTC Connoisseur Posts: 77 Karma: 10 Join Date: Sep 2011 Device: Nook, Boox C67ML	The only thing it has over the standard cloud connection is that is has values for composite columns (computed columns). But to get these values the user must manually export their library (making a copy of all books), Manually export? Well, if setting up Calibre to connect to a folder device on startup, and configuring the Reading List plugin to automatically send all their books to there is 'manual', okay, it's manual? sync that exported copy with the device, then import that copy into CC. I was also suggesting using it over the cloud. I.e., CC checks for a metadata.db or metadata.calibre file upon cloud connection. (Either using the real cloud connection, or a fake local cloud.) On the other hand, several people have asked that we support importing books sent over a cabled connection, which is a use case similar to what you are describing. I have had this on my "look at" list for some time. Perhaps the right time is approaching. You mean, reading new or changed metadata.calibre in the existing CC Default folder, and picking up new books from it? (And possible picking up metadata changes in other books, but maybe not?) That would be great. I'd actually much rather have CC do that than deal with any 'cloud connection' stuff. People who use USB connections and point CC at the same directory would have that working for them. And it also allows people to set up a cloud 'Folder device' in Calibre and sync that to CC's Default folder. (Which could even, awesomely, be synced back, unlike my fear of syncing the actual Calibre library back. Calibre notices, upon device connection, when a device has added books that aren't in metadata.calibre, or when books are deleted.) Incidentally, any thoughts on compressing metadata.calibre? It's really really large per book, and very compressible. Completely outside of any of this, reading and writing that compressed would almost certainly be faster, even if it was just over USB. Or should I take that idea to some other forum?