View Single Post
Old 06-18-2015, 07:34 AM   #82
chrisridd
Guru
chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.chrisridd ought to be getting tired of karma fortunes by now.
 
chrisridd's Avatar
 
Posts: 983
Karma: 2209358
Join Date: Nov 2011
Location: London, UK
Device: Kobo Aura, Kobo Aura ONE, PocketBook InkPad Color 3
Quote:
Originally Posted by davidfor View Post
The standards say that the file "mimetype" has to be the first in the archive and is not compressed. Reading the first 16KB would include this. This must contain the string "application/epub+zip" and checking this is supposed to be done to prove it is an epub.
That's true, but I don't recall if you can read a zip file from start to finish like that. I think you have to begin from the end of the file and work backwards. But you might be right and this is a sanity check of the mimetype file.

Quote:
Originally Posted by davidfor View Post
So, read the zip catalogue (or whatever they want to call it) and load the whole file into memory (as needed). Will strace pick up the actual IO when this is done?
No, strace only shows the system calls and not the underlying I/O. In this case the I/O is handled by the virtual memory system in the kernel, and the caller just sees the file as an array of bytes which are "magically" paged in.

Quote:
Originally Posted by davidfor View Post
The file:// URL could be for the whole book as that is used for the key in the database. But, each chapter also gets a file:// URL in the database.
Could be. I think you can tell strace to dump out more of the details of each read and write instead of truncating it, so that would (might) reveal more.

Quote:
Originally Posted by davidfor View Post
Ouch, all those file opens and rereads of the same part. An obvious reason for something like that is if the a process is checking the file for something and then passing it to another process/routine and it opens the file again. nickel doing the first and then the adobehost/RMSDK doing the second makes sense from a coding point of view. It is inefficient at runtime, but simple to code.
Yeah, it might not be so bad as the OS is likely to cache some of the reads, but it does look a bit clumsy. The multi-process architecture is good if the adobehost bit decides to crash or misbehave.

Getting timestamps in the strace output would be useful. I can't see where the time's spent apart from in the OPF scanning (55ms). It looks a bit like the TOC stuff which is ironic if it isn't used any more.
chrisridd is offline   Reply With Quote