MobileRead Forums - View Single Post

electronicfur · 08-08-2011, 05:06 AM

After trying to debug my problem, it did occur to me that there could be a few potential optimisations. My apologies if these are obvious and have already been considered, but I thought I would throw them out there in case they are useful.

I'm a bit confused for the need to read the whole database structure into memory. I thought the most efficient approach would be to process one book at a time, and I thought this was one of the advantages of calibre2opds over the inbuilt calibre opds server. I couldn't think of a reason why the whole structure is needed to be known up front.

I notice that a large chunk of time is spent on the "copy the library" part. The copy operation re-creates all the files already in the temporary directory again. This could be entirely avoided in most use cases by using a file move operation rather than file copy of the temporary files. The use case where calibre2opds is running on the same machine as the calibre library is the obvious one where a move operation would avoid recreating all the files again. And the use case of calibre2opds running against a network file share could avoid the file recreation if there was an option to specify the temporary file directory, so it could be placed on the same network share.

Finally I was unsure what the "Minimise number of changed files" option does.
I had thought that if calibre2opds read the date & time of the original _catalog creation and also placed the CRC of the profile.xml used to generate it in the _catalog, then upon start it could see if the profile.xml being used is identical, and use the date & time stamp to only process books that have been amended after this date & time. In my case its seems to always process all books, and only cary out a crc check on the generated files afterwards. Also for some reason my synclog.log always says "IGNORED (CRC)" for the files.

Cheers,
EF

08-08-2011, 05:06 AM	#427
electronicfur Connoisseur Posts: 54 Karma: 30682 Join Date: Aug 2011 Device: Samsung Note FBReader, Nook Simple Touch FBReader, (Kindle3 died)	After trying to debug my problem, it did occur to me that there could be a few potential optimisations. My apologies if these are obvious and have already been considered, but I thought I would throw them out there in case they are useful. I'm a bit confused for the need to read the whole database structure into memory. I thought the most efficient approach would be to process one book at a time, and I thought this was one of the advantages of calibre2opds over the inbuilt calibre opds server. I couldn't think of a reason why the whole structure is needed to be known up front. I notice that a large chunk of time is spent on the "copy the library" part. The copy operation re-creates all the files already in the temporary directory again. This could be entirely avoided in most use cases by using a file move operation rather than file copy of the temporary files. The use case where calibre2opds is running on the same machine as the calibre library is the obvious one where a move operation would avoid recreating all the files again. And the use case of calibre2opds running against a network file share could avoid the file recreation if there was an option to specify the temporary file directory, so it could be placed on the same network share. Finally I was unsure what the "Minimise number of changed files" option does. I had thought that if calibre2opds read the date & time of the original _catalog creation and also placed the CRC of the profile.xml used to generate it in the _catalog, then upon start it could see if the profile.xml being used is identical, and use the date & time stamp to only process books that have been amended after this date & time. In my case its seems to always process all books, and only cary out a crc check on the generated files afterwards. Also for some reason my synclog.log always says "IGNORED (CRC)" for the files. Cheers, EF