MobileRead Forums - View Single Post

fufu42 · 06-28-2012, 09:00 AM

Hi, I've been searching for a while on all kinds of places and decided to ask here for the following problem.

I have a collection of ebooks in many different formats from txt over pdf to epub, mobi etc. all inside a calibre directory. Now calibre unfortunately has ebook viewing capabilitey but no full-text search function.

From my point of view the least requirements of search functionality would be:
- regular expressions in full text search
- search inside all common ebook formats
- unicode support

optional but not vital:
- pre indexed file content
- search inside archived files
- on the fly indexing

I collected the following info up to now:

- The program Beagle fur Linux seems to have met my needs but isn't maintained any longer. I didn't try to install the latest version, did anybody lately?

- Google Desktop is discontinued - but probably had no regex but only boolean operators

- Copernic Desktop search seems interesting but the developer site states nothing about regular expressions. I haven't tried it. Has anyone?

-Agent Ransack which i just tried seems interesting but probably can't search inside epub (and other formats) though is does fast regex in fulltext with pdf, archived and other plain text-like files. (If I'll decide to use that program it would somewhat oddly mean that I'd have to convert all non-pdf files to pdf...) Agent Ransack does no indexing ahead of search.

- I wouldn't hesitate to use command line tools. Basically grep can do all I need for now. The Question then would be how to extract a corpus with the necessary file information from the library including pdf and epub formats.

Any other suggestions?

Thanks in advance!

06-28-2012, 09:00 AM	#1
fufu42 Junior Member Posts: 3 Karma: 10 Join Date: May 2012 Device: Kobo Touch	Best way to search inside ebook library? Hi, I've been searching for a while on all kinds of places and decided to ask here for the following problem. I have a collection of ebooks in many different formats from txt over pdf to epub, mobi etc. all inside a calibre directory. Now calibre unfortunately has ebook viewing capabilitey but no full-text search function. From my point of view the least requirements of search functionality would be: - regular expressions in full text search - search inside all common ebook formats - unicode support optional but not vital: - pre indexed file content - search inside archived files - on the fly indexing I collected the following info up to now: - The program Beagle fur Linux seems to have met my needs but isn't maintained any longer. I didn't try to install the latest version, did anybody lately? - Google Desktop is discontinued - but probably had no regex but only boolean operators - Copernic Desktop search seems interesting but the developer site states nothing about regular expressions. I haven't tried it. Has anyone? -Agent Ransack which i just tried seems interesting but probably can't search inside epub (and other formats) though is does fast regex in fulltext with pdf, archived and other plain text-like files. (If I'll decide to use that program it would somewhat oddly mean that I'd have to convert all non-pdf files to pdf...) Agent Ransack does no indexing ahead of search. - I wouldn't hesitate to use command line tools. Basically grep can do all I need for now. The Question then would be how to extract a corpus with the necessary file information from the library including pdf and epub formats. Any other suggestions? Thanks in advance! Last edited by fufu42; 06-28-2012 at 10:57 AM.