Quote:
Originally Posted by frostschutz
As for processing times, experiment directly on the Kobo:
# echo 3 > /proc/sys/vm/drop_caches
# time sh -c 'find /mnt/ -name "*.epub" -exec unzip -p {} content.opf \; | grep dc:title'
This extracts content.opf from all EPUB (I only have EPUB, nothing else) and prints the book title. And it does so in an inefficient way (find and unzip is magnitude slower than what can be done in a C program).
For me this takes 16 seconds for ~400 books. Directly on the Kobo itself.
I then emptied my KoboReader.sqlite to make the Kobo reprocess the same books.
After 1 minute it was at 8% complete.
After 2 minutes, 13%.
After 3 minutes, 15%.
I stopped tracking the progress at that point. It's still not done while I'm typing this post (37% after several more minutes).
If I can grab the essential data required to display a list of books in 16 seconds on the Kobo shell using an inefficient method, it's hard to understand why the Kobo is so fricking slow about doing the same thing.
|
Except that you have only done part of it. It is extracting other metadata from the OPF. Which means it has to parse the the OPF properly rather than just a simple grep. Then it is extracting the NCX file and parsing the contents and building structures based on that. How much more should that take, I don't know. I do agree that the Kobo is taking a lot longer than I would expect. I just don't know if this is because the code is inefficient, or because it is doing something that we don't know about.
All this has prompted me to do a test I've been meaning to do for a while. I wanted to compare the time taken and the size of the file and ToC. So, I generated an epub that was simply 4503 copies of "<h2>Chapter nnnn</h2>". Then I had calibre generate a ToC for that. I dropped that on my Glo HD and timed the processing. Timing from when the light turned on when ejecting to when the home screen showed, took about 55 seconds. But, the book wouldn't open and when I looked in the database, there were no ToC entries.
I changed the file to only 1000 ToC entries and put 10 copies on the Glo HD. The file is 62KB They took about 88 seconds to process. This did open OK - 72 pages of ToC!
Then I took a 560KB book I had that was made with nearly all the text in single section. It as simple text cover and about 30 chapters. I removed all but to of the chapters from the ToC. I put ten copies of this on the Glo HD. That took 22 seconds to process.
That isn't a completely conclusive, but it agrees with what I have been thinking: the size of the ToC affected the processing time more than the size of the book.