![]() |
#1 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15
Karma: 3088
Join Date: Apr 2013
Device: none
|
Make a simple Plugin for Full Text Search using Recoll
Hello, I have decided to make a very simple but useful plugin for calibre.
The idea is simple, I use recoll -t option to search in "Calibre Library" directory for documents, which contain a phrase you want to find. The Plugin will have a simple structure. 1. Button to update the recoll index 2. a place to input the text you want to search (possible also with AND and OR) 3. Run the search It is not a accurate method but I am not a programmer so just want to combine this two programs in a way, I can use them best. I search for the (id) tag in the results from recoll stdout and want to update the gui table with the books with this ids. Now I have some questions: 1) when do this numbers change or do they change after a book is deleted or added or something else (a not so important question because you can just update the recoll database every time you make some changes). 2) I want to check the actual filter (books actually shown in calibre) and compare the ids to my search from recoll. Then I can display only the results matching both filters and you can combine calibre filter and recoll. For this I want to know where to find all the information how to use different methods for example for the LibraryDatabase2 Calss. I even cant find which methods exist. Something in that sort of code is the thing I need: Code:
# Mark the records with the matching ids self.db.set_marked_ids(matched_ids) # Tell the GUI to search for all marked records self.gui.search.setEditText('marked:true') self.gui.search.do_search() Well, if someone could help, it would be very nice. PS: For someone who is a better programmer recoll has a python API, too. Maybe combine this two programs that way could solve the missing feature of having a full text search finally. Thanks |
![]() |
![]() |
![]() |
#2 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 254
Karma: 69786
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
|
The plugin Quality Check surprisingly has a very handy full-text search for epubs only. It doesn't maintain an index, but it does mark all entries with hits and restricts the window to these. You could have a look at the source and see if it has any useful hints. It might also be useful to you as it is, if your documents are epubs or it is feasible to convert them.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15
Karma: 3088
Join Date: Apr 2013
Device: none
|
Thanks for the answer. Unfortunately all my books are pdfs (Springer ebook from the uni) so the plugin itself can not be helpful for me. But maybe I find the right thinks, which I can use in my plugin.
|
![]() |
![]() |
![]() |
#4 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,487
Karma: 29308976
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Satas - I am interested in this because content searching is something I use extensively, I run Calibre on Windows, but I have colleagues with OS/X.
My understanding is that Recoll is only available for Linux, according to Calibre's Usage Stats 98% of Calibre installations are on Windows or OS/X. Windows (since Vista) and OS/X have usable built-in content search facilities, i.e. Windows Search (WS) and Spotlight. Have you considered DocFetcher, it runs on Windows, OS/X and Linux and its open source. Its written in Java so it doesn't have the same code level affinity with Calibre that Recoll would have. Recoll and Spotlight DO index common ebook file types such as ePUB and Mobi, DocFetcher, and WS DO NOT index those file types. I overcome this deficiency by creating and RTF or TXT format via conversion. However most of the documents I (and my colleagues) want to search originate from PDF, MS or Open Office files. The only source that I use regularly that provides anything else is McKinseys, they've recently started publishing some reports as ePubs alongside their PDF's. This is what I am now doing in Windows.
BR Last edited by BetterRed; 04-17-2013 at 06:29 PM. |
![]() |
![]() |
![]() |
#5 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15
Karma: 3088
Join Date: Apr 2013
Device: none
|
Hello,
well, maybe the number of 98 percent is a big one, but for me it is of no importance. I can not write for windows. It is like asking someone to hurt himself. It is only my personal opinion. I now have found the way I can integrate recoll very good. I add a new column with the calibres id (also stored in the name of the directory). So now, after I make a search with recoll I can get a string like: Code:
#cid:=34 or =56 or =76 I now will try to make a plugin for that. docfetcher or something other is not an alternative as an external program for me, because they do not have terminal (cmd think on unix) access, so you have to start the program with a gui. The other thing is, that I have bad experiences with java so I do not like programs written in it. I can tell you, how to make an own column with the ids, maybe, if you use something else you could create a string containing the ids and copy/paste it into calibre. Preferencies -> add you own columns Lookup name: cid Column heading: CID Column type: Column built form other columns Template: {id} Hope, that can help. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15
Karma: 3088
Join Date: Apr 2013
Device: none
|
Have a small difficulty and have written a new thread: https://www.mobileread.com/forums/sho...57#post2485757
Would be glad to have some help. |
![]() |
![]() |
![]() |
#7 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,487
Karma: 29308976
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
I'm with you on Java, I wont it let onto my system.
I like your idea of doing the Search in Recoll and then using its results to retrieve the books in Calibre. Integration at the backend is usually easier and less troublesome than it is at the front end. I call it 'loose coupling'. Why couldn't your plugin use the id's it gets from Recoll (i.e. #cid:=34 or =56 or =76) to access the metadata.db directly. My understanding is that the id (which Calibre puts at the end of the book directory names) is the primary key of the book table - see attachment. Good luck - BR |
![]() |
![]() |
![]() |
#8 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15
Karma: 3088
Join Date: Apr 2013
Device: none
|
Well, I could if I were a good programmer and would have a lot of time. But actually I only search for an easiest way to solve my own problem.
But with the thought in mind, that someone afterwards could take my little work and improve it. So actually, I will first do it with the normal filter and bring it to work for me, then we will see. And yes, I now the id comes from the database, but databases are a theme I never had to deal with. |
![]() |
![]() |
![]() |
#9 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15
Karma: 3088
Join Date: Apr 2013
Device: none
|
The plugin is ready to use: https://www.mobileread.com/forums/sho...d.php?t=211137
|
![]() |
![]() |
![]() |
#10 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3
Karma: 3088
Join Date: Jan 2012
Device: none
|
How about Lucene from Apache? It's used in Dropout. I'd love to see Dropout rolled into Calibre, but in lieu of that, i would think a plug in that uses Lucene would rock. Here's Lucene:
http://lucene.apache.org/core/ |
![]() |
![]() |
![]() |
Tags |
database, developement, plugin, search |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Full Text Search? | silentguy | Calibre | 4 | 02-22-2012 03:03 PM |
Feature request: full text search | Laisvunas | EPUBReader | 3 | 04-03-2011 11:47 AM |
Full Text Search Engine | Fat Abe | General Discussions | 1 | 09-21-2010 05:30 PM |
Ebook management software with full text search | SadE | Reading and Management | 4 | 05-23-2010 06:02 AM |
Google Book Search to search full-text books online | Bob Russell | Deals and Resources (No Self-Promotion or Affiliate Links) | 1 | 08-19-2006 12:13 PM |