View Full Version : Spotlight importer for Mac OS X


chrisridd
03-12-2012, 04:27 PM
I posted a brief announcement in another thread, but in case you didn't notice that, a new Spotlight plugin for ePub files has been released together with an updated Quicklook plugin.

I'd be interested in any feedback here (I helped write the Spotlight importer, and hacked on the core code a bit) but if you want to raise an "Issue" yourself on github please do so.

https://github.com/jaketmp/ePub-quicklook

Edit: the download DMG had some bad links, please try again!

It indexes quite a few bits of ePub 2 metadata, as well as the textual content of unprotected books. Indexed metadata includes:


Title
Authors
Contributors
Editors
Translators
Illustrators
ISBN
Publisher
Language


In addition it tries to detect the variant of DRM being used (Apple, Adobe, Kobo, Barnes & Noble) and for expiring (e.g. library) books, their expiry date.

paulhar
08-31-2012, 06:20 AM
I've installed this and when running queries I don't appear to get the results I'd expect.
If I search for a string of known text the epub appears but when I hover over it I see Adobe's reader showing a cover preview and no text is shown, so I can't tell from the myriad of epub hits what the context was.

Is there a better epub reader that I should be using in conjunction with this so that the search results are meaningful?

chrisridd
09-02-2012, 04:31 AM
I've installed this and when running queries I don't appear to get the results I'd expect.
If I search for a string of known text the epub appears but when I hover over it I see Adobe's reader showing a cover preview and no text is shown, so I can't tell from the myriad of epub hits what the context was.

Is there a better epub reader that I should be using in conjunction with this so that the search results are meaningful?

Hi,

It sounds like Spotlight is successfully indexing your books. When you hover over a result in the Spotlight menu, it will use the Quicklook plugin to show you some brief overview of the book.

So both plugins are working, but maybe Quicklook isn't showing you as much as you want. In particular the Quicklook plugin doesn't show any part of the book's contents at all - just the metadata gathered from one particular part of the epub file which contains the book title, author, publisher, ISBN, cover, that sort of thing.

I'm not sure what you mean by "shows Adobe's reader". Can you share a screenshot?

What else would you like the plugin to show?

Thanks,

Chris

paulhar
09-07-2012, 06:26 AM
I found a solution to my problem by some brute forcing.
Searching in Spotlight for [server data web] returns 3 files. Hovering over them in Spotlight just shows me empty metadata and wasn't useful. I can't tell which one is the one I need, so I end up having to open them all, search, close, repeat.

So, a bit of bashing later. Ideally I'd just have used iconv or similar but anyway...

#!/usr/bin/env bash
set -o errexit
set -o nounset

function exportFile {
mdimport -d2 "$1" 2>&1 | sed -e 's/\\n//g' | tr '' '\n' | tr '.' '\n' |
sed -e 's/\\U2019/''/g' -e 's/\\U2013/-/g' -e 's/\\U2212/-/g' |
sed -e 's/\\U2026/.../g' -e 's/\\U00a9/\(C\)/g' -e 's/\\U201c/\"/g' |
sed -e 's/\\U201d/\"/g' -e 's/\\U00a0//g' -e 's/\\U00d7/\*/g' |
sed -e 's/\\U2018/''/g'
}

mkdir -p txt
ls *.epub | while read FILENAME; do
echo "********** ${FILENAME}"
exportFile "${FILENAME}" > "txt/${FILENAME}.txt"
done


That dumps the raw text from the epubs into a separate directory txt.
Now I can search for the text, e.g.

Core:txt paulhargreaves$ grep -i server * | grep -i data | grep -i web
TU100_b1_p3.epub.txt: This means they avoid having to download the same information multiple times, which saves on the amount of data transferred from the web server and allows the browser to render pages more quickly
TU100_b1_p4.epub.txt:This activity is unavailableShow descriptionAs you have just seen, the server sends the web page data in a sequence of packets
TU100_b3_p2.epub.txt:Scene 11 - servers, websites and databases appear next to the headquarters and disappear into it

Now I have context - so I know which of the epubs to open - in this case the file b1_p3.

So, thank you for a great plugin, it's useful!