View Single Post
Old 10-02-2011, 08:37 AM   #198
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
FYI, I made an export of a subset of ebooks in my library.
These are the stats of 7072 epub-files.

Maybe you can find some interesting info in it.
A very curious one was the pdf file I found in one of the books.
The file is manifested on a original epub.
content of the pdf: the book cover (to print a paper cover to wrap around a hard-cover book). Including the offset information for the printing office. Curious because this book is sold as e-book this way.
Another curious thing was the fact of empty folders and empty files.
So, no request this time, just some stats you can maybe use to add extra clean-up functions or just to exclude some files of being removed.

The examined files all where cleaned first by using the plugin. That's why e.g. no itunes files are shown.

The xpgt files shown are files located in epubs with a bad opf-files causing the plugin to skip this epub while cleaning.

Spoiler:
Empty Folders:
OEBPS/Images
OEBPS/Fonts
OEBPS/Styles

Empty files (0 bytes):
OEBPS/Styles/style.css
OEBPS/Styles/style0001.css
OEBPS/Fonts/Arial
OEBPS/Fonts/AGaramondPro-Regular.otf

Code:
Extension	File Sizes (KB)	% of Total	Files	% of Files
xhtml	3.534.981	45,1%	120.584	46,7%
html	946.861	12,1%	39.115	15,2%
jpg	1.592.273	20,3%	28.180	10,9%
xml	155.732	2,0%	17.058	6,6%
css	23.285	0,3%	8.885	3,4%
png	321.662	4,1%	7.845	3,0%
ncx	42.827	0,5%	7.073	2,7%
<None>	144	0,0%	7.072	2,7%
opf	37.060	0,5%	7.069	2,7%
jpeg	210.567	2,7%	5.839	2,3%
otf	649.969	8,3%	3.238	1,3%
htm	187.503	2,4%	3.014	1,2%
gif	73.818	0,9%	1.457	0,6%
xpgt	2.640	0,0%	1.366	0,5%
ttf	39.879	0,5%	135	0,1%
thmx	58	0,0%	19	0,0%
txt	3	0,0%	7	0,0%
epub	476	0,0%	6	0,0%
dfont	7.180	0,1%	5	0,0%
bmp	1.449	0,0%	4	0,0%
svg	1.016	0,0%	4	0,0%
dtd	33	0,0%	3	0,0%
bak	9	0,0%	2	0,0%
dat	1	0,0%	2	0,0%
db	32	0,0%	2	0,0%
exe	1.281	0,0%	2	0,0%
ds_store	1	0,0%	1	0,0%
pdf	2.018	0,0%	1	0,0%


Edit 1:
the <none> is the mimetype file.
As you can see, some books have more than one opf file.

Edit 2:
While I did a recursive scan, all zips and rars possible inside epubs are extracted also so they are not shown here, there content is.

Edit 3: Distribution of filesize
Spoiler:
Distribution of sizes in H:\ebooks\Calibre Library

Code:
Size Interval	Sum of File Sizes (KB)	% of Total	Files	% of Files
Over 1 GB	0	0,0%	0	0,0%
256 MB - 1 GB	0	0,0%	0	0,0%
64 MB - 256 MB	0	0,0%	0	0,0%
16 MB - 64 MB	0	0,0%	0	0,0%
4 MB - 16 MB	0	0,0%	0	0,0%
1 MB - 4 MB	129.282	1,7%	80	0,0%
256 KB - 1 MB	1.089.428	13,9%	2.292	0,9%
64 KB - 256 KB	3.094.362	39,5%	24.428	9,5%
16 KB - 64 KB	2.730.763	34,9%	85.201	33,0%
4 KB - 16 KB	696.930	8,9%	75.826	29,4%
1 KB - 4 KB	77.415	1,0%	32.676	12,7%
0 KB - 1 KB	14.570	0,2%	37.485	14,5%

Last edited by drMerry; 10-02-2011 at 08:42 AM. Reason: extra info
drMerry is offline   Reply With Quote