Squeezing as much performance as possible out (container)

seru1us · 12-11-2018, 06:21 PM

Hey folks-

TL;DR: Make your calibre faster by putting /tmp in a ramdisk and throwing more resources at it.

In the process of validating a few hundred thousand ebooks by restructuring my library, and figured I would take some time to run some performance tests. Not sure how much this would help anyone (they are very tailored to my setup), but here are my results. If anyone has any further ideas on how to go even further (i.e. custom compiling maybe) I'm all ears. See below for my results, and sorry but I have no clue how to make tables in this forum.

My setup:
Host machine: kubuntu 18.10
Virtualization: libvirt-lxc
Root FS: ssd-backed XFS
Library location: local Raid6 MDADM mount

Guest: ubuntu 18.10 LXC
RAM: 1G
CPU Cores: 1
Library location: direct mount of folder

So the majority of what I am doing in the Calibre GUI is the metadata download, which as most of you know will probably finish a few hundred thousand books by the time the glaciers are melted. That's all I'm really using it for, and I'm containerizing it because I don't like the idea of the Python2 dependency (but that's neither here nor there).

Plugins installed:
Barne's & Noble
Count Pages
Find Duplicates
Goodreads
Quality Check

So now for the data. I ran a combined metadata + covers download for each change I made, on the same 63 eBooks. Below are the changes I made, and the resulting completion time after the change. Notice how a lot of these are specific to the metadata downloading.

vanilla (no changes): 7:10

with CALIBRE_TEMP_DIR as ramdisk: 6:48
with only Amazon, GoodReads and Google as sources: 6:11
with no tag download in metadata download: 7:23
download only metadata: 3:48
download only covers: 5:01
Amazon only with Amazon servers: 4:34 (51 found)
Amazon only with Google cache: 4:56 (49 found)
Amazon only with Bing cache: 6:05 (50 found)
with debug mode turned on: 6:25
with db in ramdisk (symlink): 7:22
Author, comment, rating title metadata (plugin): 9:21
Author, comment, rating title metadata (plugin+global): 7:22
Expand RAM from 1g to 4g: 6:28
Mount all of /tmp under a ramdisk (loop driver): 6:58
Mount all of /tmp under a ramdisk (nbd/raw driver): 6:34
Mount all of /tmp under a ramdisk (mount): 7:09
Expand core count from 1 to 8: 7:02
Expand core count from 1 to 8 and jobs from 3 to 16: 7:12

Honestly, a lot of these tweaks had little to no effect. Then I started looking more at the logs for each metadata download and started parsing them to see if there were any bottlenecks with any of the plugins/providers:

Source Type Total Average Median
-------------------------------------------
Amazon Metadata 200.9 3.2 1.9
Amazon Covers 252.3 4.0 3.1

B&M Metadata 126.6 2.0 1.6
B&M Covers 104.8 1.7 1.3

Good Metadata 46.4 0.8 0.6
Good Covers 79.6 1.2 0.7

Google Metadata 32.5 0.5 0.4
Google Covers 59.3 0.9 0.7

The Amazon plugin is disgustingly slow compared to the other three. Even if I changed the default "automatic" server selection to the amazon servers, it helped tremendously but it wasn't good enough since we go through each plugin no matter what.

With this in mind, here is what I ended up with as a final state:

End spec: 1:55 to complete !!
4096M /tmp
4096M RAM
8 cpus
removed amazon as a source

... and that is pretty much it. As suggested in a different thread, sticking all of /tmp in a ramdisk is pretty superior to setting the CALIBRE_TEMP_DIR. I'm sure in the future I'll be throwing the database on there as well with a copy/symlink, but for now it doesn't help enough to be worth the hassle.

Adoby · 12-12-2018, 06:07 PM

Most (all?) of your operations worked only on/in the DATABASE. And your set of books were most likely small enough to fit in normal RAM disk caches. This is one thing that is great with calibre. Lazy update of ebook files. Most operations on metadata never reach the actual ebook files. They only reach the database.

If you update/convert/send/save changed metadata to ebooks then you'll benefit much more from putting tmp in RAM. Then book files will have to be unzipped, metadata updated and then zipped again. That takes a lot of processing power and a lot of disk read/writes.

I just use tmpfs for tmp. And have my calibre libraries, databases and ebooks on a big SSD. I used to have the database on a small local SSD and the book files on a NAS. Worked fine. Now I use the NAS only for automatic versioned rsync backups of the calibre libraries.

Plenty of RAM benefits database access almost as much as putting the database in RAM. Not using RAM to put the database in means more RAM for disk caches that makes for faster database access.

I would suggest that you may benefit from partitioning the calibre library into several smaller libraries. Then caches will be utilized better while updating.

Also calibre use an early simple form of container. Almost all dependencies are fully contained inside the calibre install folder. So there are no problems with clashing python version.

12-11-2018, 06:21 PM	#1
seru1us Member Posts: 10 Karma: 10 Join Date: May 2018 Device: Web Browser	Squeezing as much performance as possible out (container) Hey folks- TL;DR: Make your calibre faster by putting /tmp in a ramdisk and throwing more resources at it. In the process of validating a few hundred thousand ebooks by restructuring my library, and figured I would take some time to run some performance tests. Not sure how much this would help anyone (they are very tailored to my setup), but here are my results. If anyone has any further ideas on how to go even further (i.e. custom compiling maybe) I'm all ears. See below for my results, and sorry but I have no clue how to make tables in this forum. My setup: Host machine: kubuntu 18.10 Virtualization: libvirt-lxc Root FS: ssd-backed XFS Library location: local Raid6 MDADM mount Guest: ubuntu 18.10 LXC RAM: 1G CPU Cores: 1 Library location: direct mount of folder So the majority of what I am doing in the Calibre GUI is the metadata download, which as most of you know will probably finish a few hundred thousand books by the time the glaciers are melted. That's all I'm really using it for, and I'm containerizing it because I don't like the idea of the Python2 dependency (but that's neither here nor there). Plugins installed: Barne's & Noble Count Pages Find Duplicates Goodreads Quality Check So now for the data. I ran a combined metadata + covers download for each change I made, on the same 63 eBooks. Below are the changes I made, and the resulting completion time after the change. Notice how a lot of these are specific to the metadata downloading. vanilla (no changes): 7:10 with CALIBRE_TEMP_DIR as ramdisk: 6:48 with only Amazon, GoodReads and Google as sources: 6:11 with no tag download in metadata download: 7:23 download only metadata: 3:48 download only covers: 5:01 Amazon only with Amazon servers: 4:34 (51 found) Amazon only with Google cache: 4:56 (49 found) Amazon only with Bing cache: 6:05 (50 found) with debug mode turned on: 6:25 with db in ramdisk (symlink): 7:22 Author, comment, rating title metadata (plugin): 9:21 Author, comment, rating title metadata (plugin+global): 7:22 Expand RAM from 1g to 4g: 6:28 Mount all of /tmp under a ramdisk (loop driver): 6:58 Mount all of /tmp under a ramdisk (nbd/raw driver): 6:34 Mount all of /tmp under a ramdisk (mount): 7:09 Expand core count from 1 to 8: 7:02 Expand core count from 1 to 8 and jobs from 3 to 16: 7:12 Honestly, a lot of these tweaks had little to no effect. Then I started looking more at the logs for each metadata download and started parsing them to see if there were any bottlenecks with any of the plugins/providers: Source Type Total Average Median ------------------------------------------- Amazon Metadata 200.9 3.2 1.9 Amazon Covers 252.3 4.0 3.1 B&M Metadata 126.6 2.0 1.6 B&M Covers 104.8 1.7 1.3 Good Metadata 46.4 0.8 0.6 Good Covers 79.6 1.2 0.7 Google Metadata 32.5 0.5 0.4 Google Covers 59.3 0.9 0.7 The Amazon plugin is disgustingly slow compared to the other three. Even if I changed the default "automatic" server selection to the amazon servers, it helped tremendously but it wasn't good enough since we go through each plugin no matter what. With this in mind, here is what I ended up with as a final state: End spec: 1:55 to complete !! 4096M /tmp 4096M RAM 8 cpus removed amazon as a source ... and that is pretty much it. As suggested in a different thread, sticking all of /tmp in a ramdisk is pretty superior to setting the CALIBRE_TEMP_DIR. I'm sure in the future I'll be throwing the database on there as well with a copy/symlink, but for now it doesn't help enough to be worth the hassle.

12-12-2018, 06:07 PM	#2
Adoby Handy Elephant Posts: 1,737 Karma: 26785684 Join Date: Dec 2009 Location: Southern Sweden, far out in the quiet woods Device: Samsung Galaxy Tab S8 Ultra	Most (all?) of your operations worked only on/in the DATABASE. And your set of books were most likely small enough to fit in normal RAM disk caches. This is one thing that is great with calibre. Lazy update of ebook files. Most operations on metadata never reach the actual ebook files. They only reach the database. If you update/convert/send/save changed metadata to ebooks then you'll benefit much more from putting tmp in RAM. Then book files will have to be unzipped, metadata updated and then zipped again. That takes a lot of processing power and a lot of disk read/writes. I just use tmpfs for tmp. And have my calibre libraries, databases and ebooks on a big SSD. I used to have the database on a small local SSD and the book files on a NAS. Worked fine. Now I use the NAS only for automatic versioned rsync backups of the calibre libraries. Plenty of RAM benefits database access almost as much as putting the database in RAM. Not using RAM to put the database in means more RAM for disk caches that makes for faster database access. I would suggest that you may benefit from partitioning the calibre library into several smaller libraries. Then caches will be utilized better while updating. Also calibre use an early simple form of container. Almost all dependencies are fully contained inside the calibre install folder. So there are no problems with clashing python version. Last edited by Adoby; 12-12-2018 at 06:28 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
container.mi and unicode	jackie_w	Development	7	05-27-2018 10:49 AM
Recreating Container.xhtml	SigilBear	Sigil	42	06-27-2017 05:32 PM
container.xml problems	SigilBear	Sigil	4	06-11-2017 09:43 PM
Container methods, various scenarios	jackie_w	Development	15	11-09-2015 02:32 PM
container.xml resource is missing	masheen	ePub	2	07-13-2011 10:36 PM

Advert