@JSWolf: Seems it was Python, it works in Python 2.7. I've probably done horrible things to the Python language, but here it is, no guarantuees about anything:
Code:
import argparse, codecs
parser = argparse.ArgumentParser(description='''This script will accept utf-8 text files and write a list of unique characters to stdout or an output file''')
parser.add_argument("file", nargs='+',help="input (utf-8) file(s) for character counting")
parser.add_argument("-o", "--outfile", help="outputfile")
args = parser.parse_args()
disallowed = set('')
s=set()
for f in args.file:
s=s|set(char for line in codecs.open(f, encoding="UTF-8") for char in line
if char not in disallowed)
if args.outfile:
print 'Writing to file: '+args.outfile;
with codecs.open(args.outfile, "w", "utf-8") as f:
f.write(u''.join(s))
f.close
else:
print u''.join(s).encode('utf-8')
usage: uniquechars.py [-h] [-o OUTFILE] file [file ...]
Use the output file option for Unicode files, as many glyhs won't show in a console.
If you have questions about the code I can try to answer, but I heard there are some guys in the calibre forum who probably have a bit more experience with Python