|07-12-2013, 10:10 PM||#1|
Join Date: Jan 2012
Indexing - please consider this
I have Calibre. It's a lovely thing. I have Dropout because Calibre doesn't search inside documents. I am NOT a programmer. I have been known to hire programmers (in fact I have two working for me now on other projects) and if I had the money I'd plunk it down instantement (as they say in Quebec) for someone to do the following:
Plug the Lucene search engine into Calibre as a core feature.
Imagine the sheer power of that for every student and scholar on the freakin' planet. Not only would they be able to assemble books by metadata (in Calibre) but they would be able to search INSIDE the documents themselves, enabling a query based research practice that would have mind blowing implications for the humanities.
Seriously. Dropout is free (as in beer) and is based on the Apache Lucene Search engine. You can DL it here:
What's super cool is not only does it index the contents of your books, it's TRANSPORTABLE, so you put all your books on a portable drive, index it and then carry it with you - you then have your own Personal Portable Research Library that you can use where ever you go.
And given how title centric Calibre is, imagine if you call up a title and search inside that specific title itself and find what you need...
1. Lucene works on Linux and Windows. I don't know if it works on Mac, much less iOS or Android. (So it would have to be ported or run in a virtual machine - testing that would suck. A lot. yadda yadda)
2. It would require a significant rewrite of Calibre and Calibre's UI or some kind of a branch of Calibre. Branching software is a precarious journey, so it would be better if Calibre itself absorbed this functionality.
3. Portability - I don't know if Calibre is portable, even within a platform. (Yeah, I should know that, but Julian gave me another rum and Coke...)
THINK ABOUT IT for more than 5 seconds. *Mind. Blown.* There would be no reason to use any other ereading / indexing app ever, for anyone, anywhere. It would be a complete ereading solution. If I had the money, I'd pay someone tomorrow to build it. Seriously.
Possible Plan B:
1. A Lucene plug in? That could be ugly as home made sin, but it *could* work. It would not be as elegant or as useful as being built into Calibre itself.
Seriously folks : Imagine if Calibre indexed the contents of your books. If you don't care, I can assure you there are MILLIONS of students and scholars who would pee their pants (well, they would be "overjoyed") at such a thing.
I look forward to this conversation.
warm regards to you wonderful people!
|07-15-2013, 06:03 PM||#3|
Join Date: Mar 2012
Location: NSW Australia
Lucerne's inability to search the major ebook formats, and its dependence on the effectively Windows only .NET framework probably means there's very little chance that the authors of Calibre would provide any support for it in the core product.
BTW Windows Search uses IFilters (that's why MS 'invented' them) - I have 11 installed. But sadly none for the popular ebook formats. If you know where one can get IFilters for EPUB. MOBI etc, then I too would really like to know where
If Lucerne can produce a list of the files that meet the search criteria, then you could push that list into the Import List PI and get it to create a Reading List.
And have a look at the Recoll Full Text Search Plugin - maybe you could use it as a model for developing your own plugin for Lucerne - Recoll is a Search Tool that runs on Linux and OS/X, it also indexes and searches EPUB files amongst the usual suspects.
On what basis do you make the judgement that using a plugin would be as "ugly as home made sin"
Last edited by BetterRed; 07-15-2013 at 09:46 PM. Reason: add para about IFilters & .NET
|07-15-2013, 09:20 PM||#4|
Join Date: Jan 2013
Device: Too many random androids to list
Calibre is pyqt isn't it? There are python ports of lucene, have been for years: http://lucene.apache.org/pylucene/ and Qt itself uses clucene internally (or used to, back when I last used it, which admittedly is 5-6 years and two major versions ago) so it was like a freebie anyway. There's no .net dependency specific to lucene itself, it was originally written in Java and it's been ported to practically everything (both in terms of operating system and programming language).
Ebook formats are also nothing special, most of them are a bundle of xml files wrapped in compression, and there's mountains of code already in calibre for taking apart, parsing and repackaging them, so that part isn't really a big deal I wouldn't think. There's a lot of things Calibre can do that MS doesn't
Indexing pdf's is probably harder, but there's utilities for that too. This is built into some bibliographic software like Zotero, but most of them simply run a commandline pdftotext and index the results. They aren't fancy at all, but they do the job.
|07-15-2013, 11:43 PM||#5|
creator of calibre
Join Date: Oct 2006
Location: Mumbai, India
That's not a python port, thats a python wrapper that uses JNI, which would need the jvm to be distributed with calibre, which is absolutely never going to happen. If you want to use lucene use clucene or lucy. However, I would suggest using Xapian instead.
And doing a full text search is on my TODO list.
|07-17-2013, 08:36 PM||#6|
Join Date: Jan 2013
Device: Too many random androids to list
Ah, didn't realise it was just a wrapper, thanks for the correction.
Still, good to hear it's on the TODO list, however it's done.
|calibre, feature request, index, lucene, plug-in|
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|where is indexing stored||cybmole||Amazon Kindle||23||09-03-2015 09:58 PM|
|Troubleshooting Indexing||latepaul||Amazon Kindle||13||01-15-2013 06:22 PM|
|Indexing when connected to PC?||reech||Amazon Kindle||1||05-17-2011 03:32 AM|
|Troubleshooting Indexing||Bricorn||Amazon Kindle||5||03-26-2011 05:02 PM|
|Kindle 3 Indexing||niceboy||Amazon Kindle||11||12-30-2010 10:53 AM|