![]() |
#1 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,012
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
FTS: Book count etc.
I've 6337 "books" in Calibre. Some might have no files. Some might be image based PDF (may or may not have OCR layer).
I started the indexing for FTS, but it's indexing 14249 books! Where there are multiple formats can it be set to only index epub, which would always exist of more than one format? What does it do with PDFs that have either text or an OCR text layer, or at all? Can I exclude rows from being indexed at all? |
![]() |
![]() |
![]() |
#2 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,012
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Now at 26% and ~2 hours 18 minutes estimate. Set to "Fast".
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,611
Karma: 7891011
Join Date: Sep 2020
Device: none
|
Indeed, this would be great to make the FTS database smaller and bypass the redundant results, not sure if it's possible... I think this question has been raised when the feature was implemented :\
|
![]() |
![]() |
![]() |
#4 |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,611
Karma: 7891011
Join Date: Sep 2020
Device: none
|
|
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No it will index all formats, and you cannot exclude books. As for PDF it extracts teh text using the pdftotext tool.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,012
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
|
![]() |
![]() |
![]() |
#7 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,012
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
I'm impressed with the FTS and the flexibility of it. Estimated (once off) Index build time was close to reality.
|
![]() |
![]() |
![]() |
#8 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,718
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Assuming FTS doesn't index book 'data' folders, you could move the converted PDFs, LITs, PRCs etc into it, perhaps into a '4 Posterity' sub folder.
IIRC you can reindex individual books… but I can't remember how. Found it: it's in the FTS search results, viz: I don't use FTS, except on my Test library. BR Last edited by BetterRed; 08-18-2023 at 06:34 PM. |
![]() |
![]() |
![]() |
#9 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,012
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
I don't have "converted" PDFs, just four sorts
1. Tech info, Service info (mostly not imported) 2. Manuals, instruction books etc that work OK on Sage 3. Scans of old books and magsines (mostly books, 100s of magasines not imported) (The above three in no other format) 4. And finally, files for POD exported from LO Writer direct, for proofing layout/format only (never content) on Elipsa. Not many, but more than one size per title. These have a related epub for ebook publishing and azw3 and dual mobi purely for testing. The epub is from a docx, then deleted (original still exists in same place as odt). The azw3 & dual mobi from epub. The pdf not made by Calibre but a direct LO Writer export. The only other "waste" in the indexing is where the epub is from a PD mobi or a converted epub or a bought azw3 converted to epub. Hence 6337 vs 14249 There are also some titles with no formats. Adding new books is no issue. Seems instant to index. Free space Storage 2.9 TB full-text-search-db = 10.6GB The entire library as an export is about 23 GByte, so the FTS DB seems a bit large? Though Export is compressed. Thanks for suggestions @BetterRed I've no LITs, PDBs LRFs etc. The Palm Z22 was just a test ![]() The FTS is very useful to me, not sure why I took so long to set it up. Eventually a priority list of formats (like for transfer) would be an idea so that you could decide to only index one format per title. (And rsync backed it up to my server already tonight. I never run the sync with Calibre or any editor /content creating program open). Last edited by Quoth; 08-18-2023 at 07:33 PM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Count words in whole book | franc | Reading and Management | 16 | 03-31-2021 05:35 PM |
Total book count. | Syllius | Server | 1 | 07-17-2019 05:06 AM |
Get Tag Count by book | Deina97 | Library Management | 2 | 11-26-2014 11:34 AM |
Book Count | em92150 | Calibre | 5 | 01-06-2012 09:03 AM |
Whatever happened to the book count? | GJN | Calibre | 6 | 07-25-2010 12:31 PM |