Caveat: I'm working on fiction, with nothing special for formatting.
My html is created from rtf files using OpenOffice. Then I've been using 'tidy' with a short config file:
Code:
add-xml-decl: yes
clean: yes
doctype: strict
drop-font-tags: yes
logical-emphasis: yes
output-xhtml: yes
char-encoding: utf8
That creates classes and a <style> section in the header - which I promptly delete and replace with a link to my stylesheet.css file. Then I use vim (or sed) to
Code:
1,$s/<p class.*">/<p>/
to get rid of the class references.
Then I run it through quoter to make curly quotes where required.
Then I massage with vim to fix any other typos I've missed. The rest of my process is pretty much like yours.