Here's how you can find the characters used in a xhtml file (tags are excluded) in a unix bash shell:
Code:
cat file.xhtml|sed -e 's/<[^>]\+>//g' -e 's/./&\n/g' |sort -u |tr "\n" " "
If you want to just find the characters in headers, you can try:
Code:
grep "<h[1-4]" OEBPS/vol1/12.xhtml|sed -e 's/<[^>]\+>//g' -e 's/./&\n/g' |sort -u |tr "\n" " "