View Single Post
Old 05-31-2012, 08:27 AM   #30
mhr
Junior Member
mhr began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2012
Device: Sony PRS T1
Quote:
Originally Posted by Yoths View Post
I wrote a small tool for exporting annotations from the T1 to HTML.
...
Inspired from Your work and after looking into the source code I wanted to have something similar for pure '*.txt' files, to be able to mark errors in such pure text files without any text formatting on the Sony PRS T1 and to correct them later on the PC. I looked into the sq-database and recognized, how Sony saved the begin and end of the marks (its a binary coding) for '*.txt' files. Then I wrote a very basic Python script (no error handling) to augment the naked .txt files with marker symbols, which can easily be searched for in a standard text editor (I choose [[[ ]]] for the markers in the script).

To use the script, copy the 'books.db' file from the Sony reader (of course after the highlighting annotations are made with the Sony) in Your working directory, make two subfolders called joined and marked, copy all text files from Sony reader, which should be augmented by the Sony highlighting annotations into the subfolder joined and call the script. The augmented text files with the same file name are created then in subfolder marked and can be treated by Your favorite text editor on the PC. Here is the script:

Code:
#!/usr/bin/python
# insert_text_markers
# Reads 'books.db' in current directory and converts
# each TXT-file in subdir 'joined' into TXT-file in subdir 'marked'
# with marked text indicated in 'books.db' marked by '[[[' and ']]]'.

import os,sys
import sqlite3

book2ind   = {}  # Text file names (without path) to contents index.
annotation = []  # Array of annotations with rows (cid,txt,ca,cb).

# Calculates byte offset from position coding in annotation table:
def pos2ind(pos):
  if pos[0:4] == "TXTk":
    ind = 0
    for k in range(15,11,-1):  ind = 256*ind + ord(pos[k])
    return ind
  else:  return None

# Compare function to sort annotations by range:
def cmp_pos(a,b):
  if a[2] < b[2]:  return -1
  if a[2] > b[2]:  return  1
  if a[3] < b[3]:  return -1
  if a[3] > b[3]:  return  1
  return 0

# Reads book names and annotation ranges:
def readdb(name):
  conn = sqlite3.connect(name)
  c    = conn.cursor()
  for row in c.execute("SELECT * FROM books"):  book2ind[row[13]] = row[0]
  for row in c.execute("SELECT * FROM annotation"):
    cid = int(row[1])      # The content id of the corresponding textfile.
    txt = row[6]           # The marked text.
    ca  = pos2ind(row[7])  # Start index of marked text in text file.
    cb  = pos2ind(row[8])  # End   index of marked text in text file.
    if ca != None and cb != None:
      annotation.append((cid,txt,ca,cb))
  conn.close()

def add_marks(all_txt,cur_cid):
  k   = 0
  n   = len(all_txt)
  arr = []
  for (cid,txt,ca,cb) in annotation:
    # print(cid,ca,cb,txt)
    if k <= ca:
      arr.append(all_txt[k:ca])
      arr.append("[[[")
      arr.append(all_txt[ca:cb])
      arr.append("]]]")
      k = cb
  arr.append(all_txt[k:n])
  return "".join(arr)

readdb("books.db")
annotation.sort(cmp_pos)

for name in os.listdir("joined"):
  if name in book2ind:
    print("Add marks to '%s' ..." % name)
    fh  = open(os.path.join("joined",name),"rb")
    content = fh.read()
    fh.close()
    content = add_marks(content,book2ind[name])
    fh = open(os.path.join("marked",name),"wb")
    fh.write(content)
    fh.close()
The code runs fine under Linux with python and python-sqlite3 installed.
Probable it should also run under Windows. I don't need a more sophisticated setup and changes for Your needs should be manageable.
mhr is offline   Reply With Quote