Hi, I've been searching for a while on all kinds of places and decided to ask here for the following problem.
I have a collection of ebooks in many different formats from txt over pdf to epub, mobi etc. all inside a calibre directory. Now calibre unfortunately has ebook viewing capabilitey but no full-text search function.
From my point of view the least requirements of search functionality would be:
- regular expressions in full text search
- search inside all common ebook formats
- unicode support
optional but not vital:
- pre indexed file content
- search inside archived files
- on the fly indexing
I collected the following info up to now:
- The program Beagle fur Linux seems to have met my needs but isn't maintained any longer. I didn't try to install the latest version, did anybody lately?
- Google Desktop is discontinued - but probably had no regex but only boolean operators
- Copernic Desktop search seems interesting but the developer site states nothing about regular expressions. I haven't tried it. Has anyone?
-Agent Ransack which i just tried seems interesting but probably can't search inside epub (and other formats) though is does fast regex in fulltext with pdf, archived and other plain text-like files. (If I'll decide to use that program it would somewhat oddly mean that I'd have to convert all non-pdf files to pdf...) Agent Ransack does no indexing ahead of search.
- I wouldn't hesitate to use command line tools. Basically grep can do all I need for now. The Question then would be how to extract a corpus with the necessary file information from the library including pdf and epub formats.
Any other suggestions?
Thanks in advance!