@Jellby: *Sigh* isn't there just always something... thanks for pointing it out. I try again:
Code:
cat file.xhtml|xml2asc|sed -e 's/<[^>]\+>//g' -e 's/./&\n/g' |sort -u |tr "\n" " "
How many lines do you need to do this properly, I wonder; find which tags use special fonts, extract their content etc.?