View Single Post
Old 10-08-2012, 04:29 PM   #4
Man Eating Duck
Addict
Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.
 
Posts: 254
Karma: 69786
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
@JSWolf: Seems it was Python, it works in Python 2.7. I've probably done horrible things to the Python language, but here it is, no guarantuees about anything:
Code:
import argparse, codecs
parser = argparse.ArgumentParser(description='''This script will accept utf-8 text files and write a list of unique characters to stdout or an output file''')
parser.add_argument("file", nargs='+',help="input (utf-8) file(s) for character counting")
parser.add_argument("-o", "--outfile", help="outputfile")
args = parser.parse_args()
disallowed = set('')
s=set()
for f in args.file:
	s=s|set(char for line in codecs.open(f, encoding="UTF-8") for char in line 
         if char not in disallowed)
if args.outfile:
	print 'Writing to file: '+args.outfile;
	with codecs.open(args.outfile, "w", "utf-8") as f:
		f.write(u''.join(s))
		f.close
else:
	print u''.join(s).encode('utf-8')
usage: uniquechars.py [-h] [-o OUTFILE] file [file ...]

Use the output file option for Unicode files, as many glyhs won't show in a console.

If you have questions about the code I can try to answer, but I heard there are some guys in the calibre forum who probably have a bit more experience with Python
Man Eating Duck is offline   Reply With Quote