FYI, I made an export of a subset of ebooks in my library.
These are the stats of 7072 epub-files.
Maybe you can find some interesting info in it.
A very curious one was the pdf file I found in one of the books.
The file is manifested on a original epub.
content of the pdf: the book cover (to print a paper cover to wrap around a hard-cover book). Including the offset information for the printing office. Curious because this book is sold as e-book this way.
Another curious thing was the fact of empty folders and empty files.
So, no request this time, just some stats you can maybe use to add extra clean-up functions or just to exclude some files of being removed.
The examined files all where cleaned first by using the plugin. That's why e.g. no itunes files are shown.
The xpgt files shown are files located in epubs with a bad opf-files causing the plugin to skip this epub while cleaning.
Spoiler:
Empty Folders:
OEBPS/Images
OEBPS/Fonts
OEBPS/Styles
Empty files (0 bytes):
OEBPS/Styles/style.css
OEBPS/Styles/style0001.css
OEBPS/Fonts/Arial
OEBPS/Fonts/AGaramondPro-Regular.otf
Code:
Extension File Sizes (KB) % of Total Files % of Files
xhtml 3.534.981 45,1% 120.584 46,7%
html 946.861 12,1% 39.115 15,2%
jpg 1.592.273 20,3% 28.180 10,9%
xml 155.732 2,0% 17.058 6,6%
css 23.285 0,3% 8.885 3,4%
png 321.662 4,1% 7.845 3,0%
ncx 42.827 0,5% 7.073 2,7%
<None> 144 0,0% 7.072 2,7%
opf 37.060 0,5% 7.069 2,7%
jpeg 210.567 2,7% 5.839 2,3%
otf 649.969 8,3% 3.238 1,3%
htm 187.503 2,4% 3.014 1,2%
gif 73.818 0,9% 1.457 0,6%
xpgt 2.640 0,0% 1.366 0,5%
ttf 39.879 0,5% 135 0,1%
thmx 58 0,0% 19 0,0%
txt 3 0,0% 7 0,0%
epub 476 0,0% 6 0,0%
dfont 7.180 0,1% 5 0,0%
bmp 1.449 0,0% 4 0,0%
svg 1.016 0,0% 4 0,0%
dtd 33 0,0% 3 0,0%
bak 9 0,0% 2 0,0%
dat 1 0,0% 2 0,0%
db 32 0,0% 2 0,0%
exe 1.281 0,0% 2 0,0%
ds_store 1 0,0% 1 0,0%
pdf 2.018 0,0% 1 0,0%
Edit 1:
the <none> is the mimetype file.
As you can see, some books have more than one opf file.
Edit 2:
While I did a recursive scan, all zips and rars possible inside epubs are extracted also so they are not shown here, there content is.
Edit 3: Distribution of filesize
Spoiler:
Distribution of sizes in H:\ebooks\Calibre Library
Code:
Size Interval Sum of File Sizes (KB) % of Total Files % of Files
Over 1 GB 0 0,0% 0 0,0%
256 MB - 1 GB 0 0,0% 0 0,0%
64 MB - 256 MB 0 0,0% 0 0,0%
16 MB - 64 MB 0 0,0% 0 0,0%
4 MB - 16 MB 0 0,0% 0 0,0%
1 MB - 4 MB 129.282 1,7% 80 0,0%
256 KB - 1 MB 1.089.428 13,9% 2.292 0,9%
64 KB - 256 KB 3.094.362 39,5% 24.428 9,5%
16 KB - 64 KB 2.730.763 34,9% 85.201 33,0%
4 KB - 16 KB 696.930 8,9% 75.826 29,4%
1 KB - 4 KB 77.415 1,0% 32.676 12,7%
0 KB - 1 KB 14.570 0,2% 37.485 14,5%