MobileRead Forums - View Single Post

pakoe · 11-25-2013, 01:44 PM

This script extracts all texts (highlighted texts, annotation texts, notes added to bookmarks) from the annotation xml files in folder /mnt/onboard/Digital Editions/Annotations. It is meant to run on the Kobo itself. The resulting file /mnt/onboard/pakoe/_annot.html is sorted and formatted. By adding on your Kobo a certain string to an annotation you can hide texts older than that annotation (useful if you are working through your annotations chronologically). _annot.html must already exist before (just create an empty file). The script could be started using Kobo Tweaks, I start it by simply opening _annot.html (open it again to refresh) having used fmon from KoboLauncher to make _annot.html an "alias" for the script file _annot.sh by the following line in /mnt/onboard/.kobo/on_start.sh

Code:

/mnt/onboard/pakoe/fmon /mnt/onboard/pakoe/_annot.html /mnt/onboard/pakoe/_annot.sh 2>&1 &

If you neither want to use Kobo Tweaks nor fmon, call _annot.sh from /mnt/onboard/.kobo/on_start.sh. Then _annot.html will be refreshed only by restarting the Kobo.
Of course _annot.html can't be used to edit or delete annotations or to open a book and jump to an annotation. _annot.html just shows ALL annotations (or all newer than one you selected) from ALL books well formatted and sorted. And you can view and refresh it on the Kobo without connecting to a computer.
If you have sqlite3 on your Kobo (it is included for example in "Kobo Start Menu", see https://www.mobileread.com/forums/sho...d.php?t=233259), a much simpler solution is possible, reading the annotations from the file .kobo/KoboReader.sqlite on the Kobo.
So here is _annot.sh (probably it still could be simplified):

Code:

filePath=/mnt/onboard/pakoe/
fileName=$filePath"_annot.html"
tmpFile=$filePath"tempAnnot"
annotLog=$filePath"_annot.log"
hideIfOlderThanThis="kkqqq"
count=1
date > $annotLog

mySed()
{
	echo "$1" >> $annotLog
	sed -r "$1" $fileName >> $tmpFile 2>> $annotLog
	#fileName=$filePath"tmp"$count # Auf die Datei $fileName werden bis zum nächsten mySed evtl. noch weitere Befehle angewandt, erst dann wird fileName neu definiert. Die Datei $fileName hat also nicht den Zustand unmittelbar NACH einem mySed, sondern den unmittelbar VOR einem mySed (bzw. am Ende des Scripts)!
	count=`expr $count + 1`
	mv $tmpFile $fileName
}
mySort() # setzt voraus, dass alle \n durch # ersetzt sind, ersetzt aber das letzte # vor jeder Annotation  wieder durch \n (dadurch enthält jede Zeile genau 1 Annotation) (die erste Zeile theoretisch vielleicht gar keine)
{
	mySed 's!#([^#]*<annotation>)!\n\1!g'
	# Copy something to the beginning of each line to determine sorting order, mark its ending by =:
	mySed $1
	sort -f $fileName -o $fileName
	# Remove again what was copied to determine sorting order:
	mySed 's!^[^=]+=!!g'
}

mv $fileName $filePath"_annot.bak.html"
# "^ *[^ <]" looks for continuation lines of multi-line texts. Because of the number of spaces before <dc:date> not the <dc:date> tag inside the <content> tag of an <annotation> tag is found (change date), but the <dc:date> tag inside the <annotation> tag itself (creation date). So if you are working through your annotations chronologically, you can hide older annotations by adding the value of hideIfOlderThanThis to a (new or existing) annotation:
grep -E '<annotation>|<text>|<content>|^ *[^ <]|^    <dc:date>' -r "/mnt/onboard/Digital Editions/Annotations" > $fileName 2>> $annotLog
# _<<1, _<<2 can safely be used as markers, as any < IN A TEXT has been replaced by &lt; in the .annot file (which of course contains many <, but no <<):
mySed 's!#!_<<1!g'
mySed 's!=!_<<2!g'
tr '\n' '#' < $fileName >> $tmpFile 2>> $annotLog
mv $tmpFile $fileName
mySort 's!^.*<dc:date>20(..-..-.....:..:..)!\1=\0!g'
# Remove annotations without text (bookmarks):
mySed 's!^!=!g'
mySed 's!^=(.*<text>)!\1!g'
mySed 's!^=.*$!!g' # Falls etwas gefunden wird, entsteht eine Leerzeile, d.h. die Zeile wird nicht gelöscht
tr '\n' '#' < $fileName >> $tmpFile 2>> $annotLog
mv $tmpFile $fileName
# alles vom Anfang greedy bis zum Anfang der letzten Zeile, die $hideIfOlderThanThis enthält, löschen:
mySed 's!.*#(.*?<annotation>.*?'$hideIfOlderThanThis')!\1!g'
mySort 's!^([^:]*)/([^:/]+):.*<dc:date>20(..-..-.....:..:..)!\2#\1#\3=\0!g'
mySed 's!#!\n!g'
mySed 's!^[^:]*/([^:/]+\.annot:)!=\1j=\1!g'
tr '\n' '#' < $fileName >> $tmpFile 2>> $annotLog
mv $tmpFile $fileName
# Now # is a line break, = and j= mark the beginning of an annotation file name.
# Leerzeilen wurden beim Sortieren nur durch # dargestellt und daher NICHT aus dem Weg geräumt und müssen hier berücksichtigt werden:
mySed 's!j(=[^:]+\.annot:)([^#]*#+)\1!\2!g'
mySed 's!#!\n!g'
mySed 's!j=[^:]+\.annot:!!g'
mySed 's!=([^:]+)\.annot:!<br /><b><i><u>\1</u></i></b>!g'
mySed 's!<text>!!g'
mySed 's#</text>##g'
mySed 's!<content>!<b>#</b>!g'
mySed 's#<annotation>#<br />#g'
mySed 's!'$hideIfOlderThanThis'!<b><i>\0</i></b>!g'
mySed 's#<dc:date>.*(..)[^0-9]0?([1-9][0-9]?)[^0-9]0?([1-9][0-9]?)[^0-9]+(..:..):.+</dc:date>#<b>\3.\2</b>.\1,\4#g'
mySed 's!_<<1!#!g'
mySed 's!_<<2!=!g'
echo '<html><head>' > $tmpFile
echo '<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />' >> $tmpFile
echo '</head><body>' >> $tmpFile
date +"%-d.%-m.%y %-H:%M" >> $tmpFile
echo ' ("<b><i>'$hideIfOlderThanThis'</i></b>" in einer Annotation blendet alle &auml;lteren aus!)' >> $tmpFile
cat $fileName >> $tmpFile
echo '<br />-------------------END-------------------' >> $tmpFile
echo '</body></html>' >> $tmpFile
mv $tmpFile $fileName