Thanks for the plugin! Awesome idea.
I did not installed pdftotext, but I only have 18 PDFs on my library.
I'm testing it on a library with 1130 books (many with multiple formats: EPUB, AZW3/KFX) and about 3GB.
Info about the initial indexing: It took only 16 minutes to go from 0 to 99%. But now it is stuck at 99% for about 3h45min. My system is an i7 7700HQ (16GB of RAM). My processor has 4 cores (8 threads). Plugin has chosen 8 max parallel process. Now, the strange part: according to Task Manager (Windows), my CPU is only using 20% of its total capacity.
While writing this post, it finished, after
4h05min. Now it searches instantly! Nice!
------ My first impressions and questions ------
1)
Question: When you have multiple formats for one book, does it lookup all the formats or just one?
2)
Question: On caps.json, it only shows EPUB, MOBI, PDF and TXTs files. According to this, and other tests I have done, it does not index AZW3/KFX files. Is this correct?
3)
Question: How the index works for new additions? Are the new files automatically indexed when I run ElasticSearch?
4)
Suggestion: It would be really important to have more options for search. Right now, it searches word by word. So, I can't look for phrases or compound words (Ex: coffee table. It will search for books with "coffee" OR "table"). Also, accented characters are distinguished from non-accented.
5)
Info: According to
ElasticSearch Reference, to have more options for search, you would need to change your query from "match" to "query_string". This would allow operators, wild cards and regular expressions. P.S.: "match" queries can use operators too, but you would have to code that.
6)
Info: The ZIP file attached to first post has another ZIP inside (with the full plugin).