![]() |
#1 |
Bookish
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
|
indexing file size
Hi Kovid, thanks for the including of the full text search. Much appreciated.
I have a question tough about the index file size (full-text-search.db): it is about the same size as the complete library size itself, a ratio of 1:1 so to speak. Up to now I used the program docfetcher, which manages a ratio of 5:1. Meaning my main library of about 19GiB has a full-text-search.db of ~19GiB, while docfetcher manages to squeeze their index into ~3,5GiB. That's something to be aware of due to needed and available storage sizes (and according backups). Is there any change that calibre's index file size is likely to become smaller due to possible future tweaking possibilities? |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Nope the file size is the minimum needed to provide all the features the the FTS search has.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Bookish
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
|
Ok. That is good to know on deciding to use FT.
|
![]() |
![]() |
![]() |
#4 | |
Member
![]() Posts: 13
Karma: 10
Join Date: Oct 2016
Device: Onyx Boox N96ML
|
Quote:
However, a perfect 1:1 ratio seems to me hard to believe: for example, I don't think that scanned or illustrated pdfs occupy the equivalent of their full size in the index, since it is the text layer only that is indexed. Maybe when the database is generated it already has a minimum size (and therefore the ratio is not 1: 1, even if it seems to be in your case)? Perhaps some users with heavier libraries who have already performed the indexing, can provide feedback on the size of their archive file. |
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,211
Karma: 1419583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
|
My library is not that big, but the ratio is almost 5:1 (5.04GB library and 1.18GB full-text-search.db).
Just to compare: using Power Search plugin, the index data is 1.24GB (generated by Elastic Search), virtually the same. Last edited by thiago.eec; 07-27-2022 at 09:32 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Bookish
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
|
I did some checks about the contents of my library.
Types:
IMHO nothing special. Last edited by DrChiper; 07-27-2022 at 11:53 AM. |
![]() |
![]() |
![]() |
#7 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,472
Karma: 239219543
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
|
The library I indexed is 7.6 GB and the size of the db file is 3.7 GB. Epubs only, with high-quality covers.
As my main library is close to 50 GB, I'm not going to index that. Last edited by Sirtel; 07-27-2022 at 04:21 PM. |
![]() |
![]() |
![]() |
#8 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,725
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
FWIW - MS rule of thumb for their search index size is 10% of the size of files indexed, and the X1 indexes I have, which includes my calibre libraries, is a bit less than that.
But they only run on Windows. BR |
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Remember that an index file is always gong to be much larger than the text file, because its not just text but contains information about which records contain every word, and at what offset and word count, this is what powers the NEAR operator. So essentially every word has some number of extra fields associated with it. And there will be page size overhead for efficient lookup. And of course the full text is also actually stored so that snippets can be shown.
And calibre actually indexes all text twice for the "find related words" functionality, which works by stemming all tokens. |
![]() |
![]() |
![]() |
#10 | |
Bookish
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
|
Quote:
That said, my gut feeling is that how bigger the DB to be indexed is, the bigger the resulting index DB will be: Code:
@thiago.eec -> 5.04 GB, with index DB 1.18 GB @Drchiper -> 6.05 GB, with index DB 4.4 GB @Sirtel -> 7.60 GB, with index DB 3.7 GB |
|
![]() |
![]() |
![]() |
#11 |
Leftutti
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 549
Karma: 1717097
Join Date: Feb 2019
Location: Bavaria
Device: iPad Pro, Kobo Libra 2
|
If anyone is interested
library 105 GB and index DB 21 GB. |
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Does FTI ignore 'low value' words, like 'the' and 'and' and 'to'?
|
![]() |
![]() |
![]() |
#13 |
Custom User Title
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 10,974
Karma: 75337983
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
|
![]() |
![]() |
![]() |
#14 |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,611
Karma: 7891011
Join Date: Sep 2020
Device: none
|
Library folder: 333 GB
FTS db: 70 GB (located in default folder) Hence, size of library files is 263GB, currently. I'm using a pretty fast Kingston 2TB NVMe SSD, and I make annual backups to a 4TB external HDD. By the way, earlier this year I lost some data stored on a faulty Corsair NVMe SSD, so I'd tell everyone to stay away from those! Last edited by Comfy.n; 07-31-2022 at 12:19 AM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
See file size before purchase? | ATimson | Kobo Reader | 4 | 02-02-2019 12:20 PM |
File Size | KyBunnies | Audiobook Discussions | 11 | 01-16-2015 03:58 PM |
epub file size | qsipl | Workshop | 2 | 12-16-2014 04:52 PM |
Why does the file size get reduced so much? | gers1978 | Conversion | 12 | 04-27-2013 07:33 AM |
ePub file size | Adjust | ePub | 16 | 10-27-2010 11:55 AM |