Ok, I have posted a first (test) version in the first post of this thread.
If anyone could read the information, and test it, I would be grateful for the feedback regarding issues and ideas.
Something to clean up bad markup - check those epubs generated by Word export to HTML and how many font definitions they have on every single page- , CSS and removing embedded fonts would be a terrific feature.
I am still not terribly familiar with word htmls/htmls in general. If you would elaborate on what embedded fonts and font definitions are, and how they should be dealt with in cleaning them up, I can easly add it to my parser.
I looked into friendly-name/number by font-weight; it appears that for many browsers there is no difference between 100-400, 500-600, etc. Could you provide me with a source that 400 must equal regular, etc?