MobileRead Forums - View Single Post - [GUI Plugin] Power (Full-text) Search

thiago.eec · 07-27-2020, 03:17 PM

Thanks for the plugin! Awesome idea.

I did not installed pdftotext, but I only have 18 PDFs on my library.
I'm testing it on a library with 1130 books (many with multiple formats: EPUB, AZW3/KFX) and about 3GB.

Info about the initial indexing: It took only 16 minutes to go from 0 to 99%. But now it is stuck at 99% for about 3h45min. My system is an i7 7700HQ (16GB of RAM). My processor has 4 cores (8 threads). Plugin has chosen 8 max parallel process. Now, the strange part: according to Task Manager (Windows), my CPU is only using 20% of its total capacity.

While writing this post, it finished, after 4h05min. Now it searches instantly! Nice!

------ My first impressions and questions ------

1) Question: When you have multiple formats for one book, does it lookup all the formats or just one?

2) Question: On caps.json, it only shows EPUB, MOBI, PDF and TXTs files. According to this, and other tests I have done, it does not index AZW3/KFX files. Is this correct?

3) Question: How the index works for new additions? Are the new files automatically indexed when I run ElasticSearch?

4) Suggestion: It would be really important to have more options for search. Right now, it searches word by word. So, I can't look for phrases or compound words (Ex: coffee table. It will search for books with "coffee" OR "table"). Also, accented characters are distinguished from non-accented.

5) Info: According to ElasticSearch Reference, to have more options for search, you would need to change your query from "match" to "query_string". This would allow operators, wild cards and regular expressions. P.S.: "match" queries can use operators too, but you would have to code that.

6) Info: The ZIP file attached to first post has another ZIP inside (with the full plugin).

07-27-2020, 03:17 PM	#21
thiago.eec Wizard Posts: 1,236 Karma: 1419583 Join Date: Dec 2016 Location: Goiânia - Brazil Device: iPad, Kindle Paperwhite, Kindle Oasis	Thanks for the plugin! Awesome idea. I did not installed pdftotext, but I only have 18 PDFs on my library. I'm testing it on a library with 1130 books (many with multiple formats: EPUB, AZW3/KFX) and about 3GB. Info about the initial indexing: It took only 16 minutes to go from 0 to 99%. But now it is stuck at 99% for about 3h45min. My system is an i7 7700HQ (16GB of RAM). My processor has 4 cores (8 threads). Plugin has chosen 8 max parallel process. Now, the strange part: according to Task Manager (Windows), my CPU is only using 20% of its total capacity. While writing this post, it finished, after 4h05min. Now it searches instantly! Nice! ------ My first impressions and questions ------ 1) Question: When you have multiple formats for one book, does it lookup all the formats or just one? 2) Question: On caps.json, it only shows EPUB, MOBI, PDF and TXTs files. According to this, and other tests I have done, it does not index AZW3/KFX files. Is this correct? 3) Question: How the index works for new additions? Are the new files automatically indexed when I run ElasticSearch? 4) Suggestion: It would be really important to have more options for search. Right now, it searches word by word. So, I can't look for phrases or compound words (Ex: coffee table. It will search for books with "coffee" OR "table"). Also, accented characters are distinguished from non-accented. 5) Info: According to ElasticSearch Reference, to have more options for search, you would need to change your query from "match" to "query_string". This would allow operators, wild cards and regular expressions. P.S.: "match" queries can use operators too, but you would have to code that. 6) Info: The ZIP file attached to first post has another ZIP inside (with the full plugin).