MobileRead Forums - View Single Post

itimpi · 08-08-2011, 11:15 AM

Quote:

Originally Posted by electronicfur

I'm a bit confused for the need to read the whole database structure into memory. I thought the most efficient approach would be to process one book at a time, and I thought this was one of the advantages of calibre2opds over the inbuilt calibre opds server. I couldn't think of a reason why the whole structure is needed to be known up front.

The reason for doing this is it something like 10-100 times faster than going back to the database to get the details of each book. The decision was therefore made at implementation stage to take the memory hit to try and maximise the generate speed. Note that the big gain over the Calibre server is that you do not take the memory hit when trying to download books.

Quote:

I notice that a large chunk of time is spent on the "copy the library" part. The copy operation re-creates all the files already in the temporary directory again. This could be entirely avoided in most use cases by using a file move operation rather than file copy of the temporary files. The use case where calibre2opds is running on the same machine as the calibre library is the obvious one where a move operation would avoid recreating all the files again. And the use case of calibre2opds running against a network file share could avoid the file recreation if there was an option to specify the temporary file directory, so it could be placed on the same network share.

It might be worth re-vistiing this code again to see if there is further room for optimisation. The current optimisation was added as extra logic that can be run after the main generate phase had been run. Note that the optimisation does NOT copy files from the temporary location to the final target if they are unchanged from the previous run.

The reason it was done that way is that the optimisation could be developed by someone who did not need to understand the logic of the code that generates the catalog files in the first place.

This was a side-effect of the fact that multiple people are involved in the development and they do not understand all the program logic at a detailed level so try to make changes/improvements that are as localised as far as possible.

That does not mean that further improvements are not possible. I am not sure how easy it might be to use a move/rename from Java and in a platform independent way. Definitely worth looking at though.

Quote:

Finally I was unsure what the "Minimise number of changed files" option does.

This avoid copying to the final target any files whose content is unchanged from the previous run. In earlier releases when this optimisation did not exist, users were finding that when using systems like Dropbox many files whose content was the same as the previous run but had a newer date/time ended up being copied and thus needing to be uploaded to the 'cloud'

Quote:

I had thought that if calibre2opds read the date & time of the original _catalog creation and also placed the CRC of the profile.xml used to generate it in the _catalog, then upon start it could see if the profile.xml being used is identical, and use the date & time stamp to only process books that have been amended after this date & time. In my case its seems to always process all books, and only cary out a crc check on the generated files afterwards.

Might be worth thinking a bit about that. I think at the moment the profile.xml is always written out so its date/time stamp may not be usable without some change to program logic.

It might now be possible to use the timestamp of the metadata.opf file in the Calibre library to work out when a particular book was ast amended. At the time most of the Calibre2Opds logic was developed that file did not exist.

Quote:

Also for some reason my synclog.log always says "IGNORED (CRC)" for the files.

Not sure about that. That log was only added very recently so I am not sure how good it is. The ignored CRC message would normally means some other criteria (such as a changed file size) was used to decide on whether to do the copy.