Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 07-25-2022, 08:03 AM   #1
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
indexing file size

Hi Kovid, thanks for the including of the full text search. Much appreciated.

I have a question tough about the index file size (full-text-search.db): it is about the same size as the complete library size itself, a ratio of 1:1 so to speak. Up to now I used the program docfetcher, which manages a ratio of 5:1.
Meaning my main library of about 19GiB has a full-text-search.db of ~19GiB, while docfetcher manages to squeeze their index into ~3,5GiB.

That's something to be aware of due to needed and available storage sizes (and according backups).

Is there any change that calibre's index file size is likely to become smaller due to possible future tweaking possibilities?
DrChiper is offline   Reply With Quote
Old 07-25-2022, 08:13 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Nope the file size is the minimum needed to provide all the features the the FTS search has.
kovidgoyal is offline   Reply With Quote
Advert
Old 07-25-2022, 08:18 AM   #3
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
Ok. That is good to know on deciding to use FT.
DrChiper is offline   Reply With Quote
Old 07-27-2022, 08:18 AM   #4
albell
Member
albell began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Oct 2016
Device: Onyx Boox N96ML
Quote:
Originally Posted by DrChiper View Post

I have a question tough about the index file size (full-text-search.db): it is about the same size as the complete library size itself, a ratio of 1:1 so to speak. Up to now I used the program docfetcher, which manages a ratio of 5:1.
Meaning my main library of about 19GiB has a full-text-search.db of ~19GiB,
It is definitely something that would be useful to know in the case of a very large library.

However, a perfect 1:1 ratio seems to me hard to believe: for example, I don't think that scanned or illustrated pdfs occupy the equivalent of their full size in the index, since it is the text layer only that is indexed.

Maybe when the database is generated it already has a minimum size (and therefore the ratio is not 1: 1, even if it seems to be in your case)?

Perhaps some users with heavier libraries who have already performed the indexing, can provide feedback on the size of their archive file.
albell is offline   Reply With Quote
Old 07-27-2022, 09:28 AM   #5
thiago.eec
Wizard
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 1,211
Karma: 1419583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
My library is not that big, but the ratio is almost 5:1 (5.04GB library and 1.18GB full-text-search.db).

Just to compare: using Power Search plugin, the index data is 1.24GB (generated by Elastic Search), virtually the same.

Last edited by thiago.eec; 07-27-2022 at 09:32 AM.
thiago.eec is offline   Reply With Quote
Advert
Old 07-27-2022, 11:50 AM   #6
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
I did some checks about the contents of my library.
Types:
  • 99,50% = epub
  • 00,30% = pdf
  • 00,13% = epub + pdf
  • 00,06% = other
File sizes:
  • 16.00% > 1M
  • 84,08% < 1M
  • 65,17% < 500K
  • 23,78% < 250K
  • 03,38% < 100K

IMHO nothing special.

Last edited by DrChiper; 07-27-2022 at 11:53 AM.
DrChiper is offline   Reply With Quote
Old 07-27-2022, 04:17 PM   #7
Sirtel
Grand Sorcerer
Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.
 
Sirtel's Avatar
 
Posts: 13,472
Karma: 239219543
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
The library I indexed is 7.6 GB and the size of the db file is 3.7 GB. Epubs only, with high-quality covers.

As my main library is close to 50 GB, I'm not going to index that.

Last edited by Sirtel; 07-27-2022 at 04:21 PM.
Sirtel is offline   Reply With Quote
Old 07-28-2022, 07:30 PM   #8
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,725
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
FWIW - MS rule of thumb for their search index size is 10% of the size of files indexed, and the X1 indexes I have, which includes my calibre libraries, is a bit less than that.

But they only run on Windows.

BR
BetterRed is offline   Reply With Quote
Old 07-28-2022, 09:26 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Remember that an index file is always gong to be much larger than the text file, because its not just text but contains information about which records contain every word, and at what offset and word count, this is what powers the NEAR operator. So essentially every word has some number of extra fields associated with it. And there will be page size overhead for efficient lookup. And of course the full text is also actually stored so that snippets can be shown.

And calibre actually indexes all text twice for the "find related words" functionality, which works by stemming all tokens.
kovidgoyal is offline   Reply With Quote
Old 07-29-2022, 05:09 AM   #10
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
Quote:
Originally Posted by albell View Post
Maybe when the database is generated it already has a minimum size (and therefore the ratio is not 1: 1, even if it seems to be in your case)?
Not quite sure what you mean by this. What I did prior to indexing was issuing the DB maintenance function for that database, which "vacuumed" it and did made inconsistency checks. So the DB to be indexed was sound and minimized.

That said, my gut feeling is that how bigger the DB to be indexed is, the bigger the resulting index DB will be:
Code:
@thiago.eec -> 5.04 GB, with index DB 1.18 GB
@Drchiper   -> 6.05 GB, with index DB 4.4 GB
@Sirtel     -> 7.60 GB, with index DB 3.7 GB
What is not yet is considered is the kind of involved ebook contents, which surely has impact. Mine hardly contain pictures, but do have many words (250+ pages) So: YMMV.
DrChiper is offline   Reply With Quote
Old 07-30-2022, 01:54 PM   #11
Wiggo
Leftutti
Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.Wiggo ought to be getting tired of karma fortunes by now.
 
Wiggo's Avatar
 
Posts: 549
Karma: 1717097
Join Date: Feb 2019
Location: Bavaria
Device: iPad Pro, Kobo Libra 2
If anyone is interested

library 105 GB and index DB 21 GB.
Wiggo is offline   Reply With Quote
Old 07-30-2022, 03:57 PM   #12
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Does FTI ignore 'low value' words, like 'the' and 'and' and 'to'?
phossler is offline   Reply With Quote
Old 07-30-2022, 04:21 PM   #13
ownedbycats
Custom User Title
ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.
 
ownedbycats's Avatar
 
Posts: 10,974
Karma: 75337983
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
Quote:
Originally Posted by phossler View Post
Does FTI ignore 'low value' words, like 'the' and 'and' and 'to'?
I wouldn't think so, as otherwise "the exact match" would fail.
ownedbycats is offline   Reply With Quote
Old 07-31-2022, 12:07 AM   #14
Comfy.n
want to learn what I want
Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.
 
Posts: 1,611
Karma: 7891011
Join Date: Sep 2020
Device: none
Library folder: 333 GB
FTS db: 70 GB (located in default folder)
Hence, size of library files is 263GB, currently.

I'm using a pretty fast Kingston 2TB NVMe SSD, and I make annual backups to a 4TB external HDD. By the way, earlier this year I lost some data stored on a faulty Corsair NVMe SSD, so I'd tell everyone to stay away from those!

Last edited by Comfy.n; 07-31-2022 at 12:19 AM.
Comfy.n is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
See file size before purchase? ATimson Kobo Reader 4 02-02-2019 12:20 PM
File Size KyBunnies Audiobook Discussions 11 01-16-2015 03:58 PM
epub file size qsipl Workshop 2 12-16-2014 04:52 PM
Why does the file size get reduced so much? gers1978 Conversion 12 04-27-2013 07:33 AM
ePub file size Adjust ePub 16 10-27-2010 11:55 AM


All times are GMT -4. The time now is 03:10 PM.


MobileRead.com is a privately owned, operated and funded community.