View Single Post
Old 09-07-2012, 06:26 AM   #4
paulhar
Junior Member
paulhar has a spectacular aura aboutpaulhar has a spectacular aura aboutpaulhar has a spectacular aura aboutpaulhar has a spectacular aura aboutpaulhar has a spectacular aura aboutpaulhar has a spectacular aura aboutpaulhar has a spectacular aura aboutpaulhar has a spectacular aura aboutpaulhar has a spectacular aura aboutpaulhar has a spectacular aura aboutpaulhar has a spectacular aura about
 
Posts: 4
Karma: 4156
Join Date: Jan 2010
Device: none
I found a solution to my problem by some brute forcing.
Searching in Spotlight for [server data web] returns 3 files. Hovering over them in Spotlight just shows me empty metadata and wasn't useful. I can't tell which one is the one I need, so I end up having to open them all, search, close, repeat.

So, a bit of bashing later. Ideally I'd just have used iconv or similar but anyway...

Code:
#!/usr/bin/env bash
set -o errexit
set -o nounset

function exportFile {
  mdimport -d2 "$1" 2>&1 | sed -e 's/\\n/±/g' | tr '±' '\n' | tr '.' '\n' |
    sed -e 's/\\U2019/''/g'  -e 's/\\U2013/-/g'     -e 's/\\U2212/-/g'  |
    sed -e 's/\\U2026/.../g' -e 's/\\U00a9/\(C\)/g' -e 's/\\U201c/\"/g' |
    sed -e 's/\\U201d/\"/g'  -e 's/\\U00a0//g'      -e 's/\\U00d7/\*/g' |
    sed -e 's/\\U2018/''/g' 
}
 
mkdir -p txt
ls *.epub | while read FILENAME; do 
  echo "********** ${FILENAME}"
  exportFile "${FILENAME}" > "txt/${FILENAME}.txt" 
done

That dumps the raw text from the epubs into a separate directory txt.
Now I can search for the text, e.g.

Code:
Core:txt paulhargreaves$ grep -i server * | grep -i data | grep -i web
TU100_b1_p3.epub.txt: This means they avoid having to download the same information multiple times, which saves on the amount of data transferred from the web server and allows the browser to render pages more quickly
TU100_b1_p4.epub.txt:This activity is unavailableShow descriptionAs you have just seen, the server sends the web page data in a sequence of packets
TU100_b3_p2.epub.txt:Scene 11 - servers, websites and databases appear next to the headquarters and disappear into it
Now I have context - so I know which of the epubs to open - in this case the file b1_p3.

So, thank you for a great plugin, it's useful!
paulhar is offline   Reply With Quote