MobileRead Forums - View Single Post - Normal for kobo to take forever on "processing content" ?

davidfor · 06-18-2015, 04:32 AM

Quote:

Originally Posted by chrisridd

Breaking down the adobehost strace a little more.

First we can see some timestamp information in the clock_gettime calls. The numbers printed are time since some point in { seconds, nanoseconds }.

At 269.466764989 the file is opened, the first 16KB is read, and the file's closed. There isn't anything you can do with the first bit of a zip file except check the first bytes match "PK".

The standards say that the file "mimetype" has to be the first in the archive and is not compressed. Reading the first 16KB would include this. This must contain the string "application/epub+zip" and checking this is supposed to be done to prove it is an epub.

Quote:

Then the file is opened again, and the last 2744 bytes read. I think that's a required zip file trailer, which lets the program find stuff in the rest of the file. Then the file is mmap()ed in, then closed, and then unmapped.

So, read the zip catalogue (or whatever they want to call it) and load the whole file into memory (as needed). Will strace pick up the actual IO when this is done?

Quote:

Just before 269.521851864, 1089 bytes is written to nickel including the text "DC.creator". So this looks like some (probably all) of the relevant metadata from the OPF file.

Does that sound like enough? If it is all the metadata that nickel uses, it will include the description and I have lots of books that would have more than that in the description.

Quote:

At 269.526128239, the file is opened for the third time, the first 16KB is read, and the file closed.

Then the file is opened again, the last 2744 bytes read, and then the whole file is mmap()ed again. After the file is closed and unmapped, 2213 bytes is written back including some file:// URL. We can't see most of this. There's no disk writes. Could this be the TOC?

The file:// URL could be for the whole book as that is used for the key in the database. But, each chapter also gets a file:// URL in the database.

Quote:

However something is also logged at this point "<15>Jun 17 16:37:09 adobehost: v"... which may be worth finding.

It is now 270.338781864.

About 1/2 a second later at 270.801502864 the file is opened again so its first 16KB can be read again

Can we deduce anything from this? Well, the metadata extraction from the OPF is quick (55ms) so that doesn't seem to be the issue. Something else big is being passed back, which contains a file URL. Given the tests in this thread I'd guess that's the TOC. Generally the process looks inefficient because it keeps opening and closing the same file. I wonder if reading 16KB matches the block size used on the filesystem.

The filesystem is FAT32. The cluster size is probably 4KB unless you have updated the size of your internal card. Then it might be 16KB. Of course, the filesystem driver might read multiple blocks.

Quote:

Overall the work took from 269.466764989 to 270.801502864, a little less than 1.5 seconds, but at least a third of that time is waiting for nickel.

Ouch, all those file opens and rereads of the same part. An obvious reason for something like that is if the a process is checking the file for something and then passing it to another process/routine and it opens the file again. nickel doing the first and then the adobehost/RMSDK doing the second makes sense from a coding point of view. It is inefficient at runtime, but simple to code.