Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 03-22-2013, 12:13 PM   #1
mst
Connoisseur
mst began at the beginning.
 
Posts: 73
Karma: 10
Join Date: Dec 2010
Device: Kobo Clara HD
Searching multiple pdfs in library

Hi there,

I love Calibre since 2011, but repeatedly run into the same problem:

With hundreds of academic papers in calibre, I often need to search several pdfs and other files (epub etc.) for specific terms. So far I am doing this with Foxit, but this often takes forever and only considers pdfs, also because Foxit does not index files stored in a specific folder.

For calibre, this would be easy to do: Index all ebooks and files in the library once, and then just search the index. Perhaps this is available already as a plugin, if so, please let me know. If not, I think this would be a great productivity tool!

All best,
F

Last edited by mst; 03-22-2013 at 12:22 PM.
mst is offline   Reply With Quote
Old 03-22-2013, 07:45 PM   #2
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 22,005
Karma: 30277294
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Upgrade to Windows 8 and use Windows desktop search, its light years of the one that runs on XP, which from memory was on the wrong side of useless

My main library is also academic and similar papers - WDS works like a dream. It finds things in my calibre folder faster than calibre, but so it should; its database is proprietary and optimised for Windows NTFS and search.

On XP you could try using the Google or Copernic tools, if they still exist. IMO the best searcher on XP was the one from Yahoo. But, they discontinued it; maybe MS snaffled it and folded into their current WDS.

It would not be 'very hard' to integrate Calibre with Windows Search on Vista/7/8. If Apple expose an API for Spotlight then it may not be 'very hard' to integrate Calibre with it. I'm not sure what one would do regarding linux - Recoll is a possibility, but its not vanilla linux and it may not run on all variants.

But to do all three may be 'very hard', be considerable work as one would probably want to rationalise them into some common shape - almost inevitably that will mean trade-offs and lcd compromises. Then comes the problem of supporting different versions and features even within the same product - some vendors pay more attention to backward compatibility than others do.

AFAIK there's no WDS iFilter for ePUB, so I convert them to RTF and Windows indexes those, there are several iFilters for PDF's. Spotlight will index ePubs and other eBook formats via an add on, as will Recoll, also via an add on.

A major advantage of OS indexing and search is that it will get other things, like emails, presentations, correspondence, spreadsheets, blog posts etc. For me that's invaluable - more or less a must have.

Another is that it avoids the problem of over tagging. Example: 'round-trip banking' has become one of the 'in phrases' since Krugman used it in the NYT last week, I don't have it as a tag in Calibre. But in less than a couple of seconds, WDS turned up 19 relevant documents (12 in my Calibre library, 3 emails, 2 spreadsheets & 2 blog/media comments).

BTW - I access WDS most often via xplorer2 Ultimate Edition, its my normal file manager; this demonstrates WDS can be integrated with 3rd party products.

But asking Calibre to integrate itself with these OS level tools is a very big ask, its hard enough keeping up with the vagaries of different file systems on different platforms - throw platform dependent search indexers into the mix and you're into OMG territory

BR

Just watch, it'll turn up in 0.9.30 and I'll have egg on my face - again

Last edited by BetterRed; 03-22-2013 at 09:57 PM.
BetterRed is offline   Reply With Quote
Advert
Old 03-22-2013, 10:36 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Full text indexing is on my TODO list, however, it is a large feature so dont expect it any time soon. As BR says, there are many tools that do full text indexing already.
kovidgoyal is offline   Reply With Quote
Old 03-23-2013, 06:14 AM   #4
mst
Connoisseur
mst began at the beginning.
 
Posts: 73
Karma: 10
Join Date: Dec 2010
Device: Kobo Clara HD
Thanks guys.

I'm also using Mendeley for Papers, it's fast and useful, but I prefer Calibre's interface. If the search/indexing function of Mendeley was merged into Calibre... a match made in heaven!
mst is offline   Reply With Quote
Old 03-23-2013, 07:24 AM   #5
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 22,005
Karma: 30277294
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by mst View Post
Thanks guys.

I'm also using Mendeley for Papers, it's fast and useful, but I prefer Calibre's interface. If the search/indexing function of Mendeley was merged into Calibre... a match made in heaven!
mst - there is a Mendeley plugin here https://www.mobileread.com/forums/sho...light=mendeley

Maybe heaven's just a click away

BR
BetterRed is offline   Reply With Quote
Advert
Old 03-23-2013, 09:42 AM   #6
mst
Connoisseur
mst began at the beginning.
 
Posts: 73
Karma: 10
Join Date: Dec 2010
Device: Kobo Clara HD
Thanks!
mst is offline   Reply With Quote
Old 03-25-2013, 07:34 AM   #7
mst
Connoisseur
mst began at the beginning.
 
Posts: 73
Karma: 10
Join Date: Dec 2010
Device: Kobo Clara HD
I've looked into the in-document search tools that BetterRed recommended, but haven't quite found what I was looking for. I'm basically trying to replicate the search from Mendeley with Calibre.

For Windows 7, does anyone know a lightweight search tool other than Windows Search, that will allow to just index and search the calibre folder for specific words in documents fast?

The other tools listed either need a whole harddisk index (which I disabled) or will create their own, but I only want to search and index documents in the calibre folder.

Last edited by mst; 03-25-2013 at 08:01 AM.
mst is offline   Reply With Quote
Old 03-25-2013, 10:02 AM   #8
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 22,005
Karma: 30277294
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Windows Search can be configured to only index certain directories and certain files

Control Panel->Indexing Options->Modify to set the locations you want indexed
Control Panel->Indexing Options->Advanced to set the file types you want indexed and how they are to be indexed

These are tick the box dialogues, just untick the things you don't want - simple.

I index everything on 3x2TB drives (about 3.9TB) I suffer a negligible performance hit, don't even know its running, and retrieval is damn quick. On XP, WDS was not fit for purpose (on the wrong side of useless), but its a totally different experience on W7/8.

If I only want to search my Calibre Libraries directory I just navigate there in Windows Explorer, enter the search term, press enter and bingo the list of matching files in my Calibre Libraries pops up more or less instantly.

Google have dropped their GDS product, as did Yahoo (which was the best 10 years ago). Copernic is still available. Another one you might like to try is DocFetcher

But I suggest you configure WDS and try it first.

One downside of WDS is that there's no iFilter (the gadgets that pull the 'text' out of files for the indexer to process) for ePubs, Mobis etc - I currently work around it by creating an RTF if I don't have a PDF or other suitable format (doc, odt etc). But I don't think Copernic and Docfetcher index e-book formats either - last time I looked at Copernic it used the same iFilter technology as WDS, that may have changed because that was quite a while back, Google's desktop search also used iFilters.

Spotlight on OS/X and Recoll on Linux will index everything, including e-book formats via add-ons, which may even be iFilters under another name :lol:

BR

Last edited by BetterRed; 03-25-2013 at 10:04 AM.
BetterRed is offline   Reply With Quote
Old 03-25-2013, 11:04 AM   #9
mst
Connoisseur
mst began at the beginning.
 
Posts: 73
Karma: 10
Join Date: Dec 2010
Device: Kobo Clara HD
Thanks BetterRed, Docfetcher is exactly what I was looking for. I generally don't like services indexing in the background, and that can easily be disabled in Docfetcher.

Now if Docfetcher was directly available in Calibre with a right-click on bookshelves, Mendeley would be history...
mst is offline   Reply With Quote
Old 03-25-2013, 07:46 PM   #10
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 22,005
Karma: 30277294
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by mst View Post
Thanks BetterRed, Docfetcher is exactly what I was looking for. I generally don't like services indexing in the background, and that can easily be disabled in Docfetcher.

Now if Docfetcher was directly available in Calibre with a right-click on bookshelves, Mendeley would be history...
If/when this happens you might get some interest in doing some sort of Calibre/DocFetcher integration - perhaps via a plug-in

BR
BetterRed is offline   Reply With Quote
Reply

Tags
calibre, index, library, search

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Searching Through Multiple Epub files at once Giantcrab Library Management 6 04-19-2023 07:37 PM
multiple searching using the tag browser alansplace Calibre 2 07-11-2011 06:59 PM
Searching your library Dr. T enTourage Archive 3 12-09-2010 01:01 PM
Searching the E-book library GBerlin General Discussions 2 10-21-2010 04:24 PM
Searching for a way to batch-update PDFs metadata Pulp PDF 2 01-13-2009 06:40 AM


All times are GMT -4. The time now is 09:34 PM.


MobileRead.com is a privately owned, operated and funded community.