Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 12-11-2018, 06:21 PM   #1
seru1us
Member
seru1us began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2018
Device: Web Browser
Squeezing as much performance as possible out (container)

Hey folks-

TL;DR: Make your calibre faster by putting /tmp in a ramdisk and throwing more resources at it.

In the process of validating a few hundred thousand ebooks by restructuring my library, and figured I would take some time to run some performance tests. Not sure how much this would help anyone (they are very tailored to my setup), but here are my results. If anyone has any further ideas on how to go even further (i.e. custom compiling maybe) I'm all ears. See below for my results, and sorry but I have no clue how to make tables in this forum.


My setup:
Host machine: kubuntu 18.10
Virtualization: libvirt-lxc
Root FS: ssd-backed XFS
Library location: local Raid6 MDADM mount

Guest: ubuntu 18.10 LXC
RAM: 1G
CPU Cores: 1
Library location: direct mount of folder

So the majority of what I am doing in the Calibre GUI is the metadata download, which as most of you know will probably finish a few hundred thousand books by the time the glaciers are melted. That's all I'm really using it for, and I'm containerizing it because I don't like the idea of the Python2 dependency (but that's neither here nor there).

Plugins installed:
Barne's & Noble
Count Pages
Find Duplicates
Goodreads
Quality Check



So now for the data. I ran a combined metadata + covers download for each change I made, on the same 63 eBooks. Below are the changes I made, and the resulting completion time after the change. Notice how a lot of these are specific to the metadata downloading.

vanilla (no changes): 7:10

with CALIBRE_TEMP_DIR as ramdisk: 6:48
with only Amazon, GoodReads and Google as sources: 6:11
with no tag download in metadata download: 7:23
download only metadata: 3:48
download only covers: 5:01
Amazon only with Amazon servers: 4:34 (51 found)
Amazon only with Google cache: 4:56 (49 found)
Amazon only with Bing cache: 6:05 (50 found)
with debug mode turned on: 6:25
with db in ramdisk (symlink): 7:22
Author, comment, rating title metadata (plugin): 9:21
Author, comment, rating title metadata (plugin+global): 7:22
Expand RAM from 1g to 4g: 6:28
Mount all of /tmp under a ramdisk (loop driver): 6:58
Mount all of /tmp under a ramdisk (nbd/raw driver): 6:34
Mount all of /tmp under a ramdisk (mount): 7:09
Expand core count from 1 to 8: 7:02
Expand core count from 1 to 8 and jobs from 3 to 16: 7:12


Honestly, a lot of these tweaks had little to no effect. Then I started looking more at the logs for each metadata download and started parsing them to see if there were any bottlenecks with any of the plugins/providers:

Source Type Total Average Median
-------------------------------------------
Amazon Metadata 200.9 3.2 1.9
Amazon Covers 252.3 4.0 3.1

B&M Metadata 126.6 2.0 1.6
B&M Covers 104.8 1.7 1.3

Good Metadata 46.4 0.8 0.6
Good Covers 79.6 1.2 0.7

Google Metadata 32.5 0.5 0.4
Google Covers 59.3 0.9 0.7


The Amazon plugin is disgustingly slow compared to the other three. Even if I changed the default "automatic" server selection to the amazon servers, it helped tremendously but it wasn't good enough since we go through each plugin no matter what.

With this in mind, here is what I ended up with as a final state:

End spec: 1:55 to complete !!
4096M /tmp
4096M RAM
8 cpus
removed amazon as a source


... and that is pretty much it. As suggested in a different thread, sticking all of /tmp in a ramdisk is pretty superior to setting the CALIBRE_TEMP_DIR. I'm sure in the future I'll be throwing the database on there as well with a copy/symlink, but for now it doesn't help enough to be worth the hassle.
seru1us is offline   Reply With Quote
Old 12-12-2018, 06:07 PM   #2
Adoby
Handy Elephant
Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.
 
Adoby's Avatar
 
Posts: 1,737
Karma: 26785684
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Samsung Galaxy Tab S8 Ultra
Most (all?) of your operations worked only on/in the DATABASE. And your set of books were most likely small enough to fit in normal RAM disk caches. This is one thing that is great with calibre. Lazy update of ebook files. Most operations on metadata never reach the actual ebook files. They only reach the database.

If you update/convert/send/save changed metadata to ebooks then you'll benefit much more from putting tmp in RAM. Then book files will have to be unzipped, metadata updated and then zipped again. That takes a lot of processing power and a lot of disk read/writes.

I just use tmpfs for tmp. And have my calibre libraries, databases and ebooks on a big SSD. I used to have the database on a small local SSD and the book files on a NAS. Worked fine. Now I use the NAS only for automatic versioned rsync backups of the calibre libraries.

Plenty of RAM benefits database access almost as much as putting the database in RAM. Not using RAM to put the database in means more RAM for disk caches that makes for faster database access.

I would suggest that you may benefit from partitioning the calibre library into several smaller libraries. Then caches will be utilized better while updating.

Also calibre use an early simple form of container. Almost all dependencies are fully contained inside the calibre install folder. So there are no problems with clashing python version.

Last edited by Adoby; 12-12-2018 at 06:28 PM.
Adoby is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
container.mi and unicode jackie_w Development 7 05-27-2018 10:49 AM
Recreating Container.xhtml SigilBear Sigil 42 06-27-2017 05:32 PM
container.xml problems SigilBear Sigil 4 06-11-2017 09:43 PM
Container methods, various scenarios jackie_w Development 15 11-09-2015 02:32 PM
container.xml resource is missing masheen ePub 2 07-13-2011 10:36 PM


All times are GMT -4. The time now is 04:48 PM.


MobileRead.com is a privately owned, operated and funded community.