Had a chance to check out your script and have had good luck with it. I've found certain scenarios, though, where the quotes in the DOCTYPE are changed to entities. It's a Pretty-Print thing, I think.
If the DOCTYPE is all on one line, your script leaves it alone.
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
But if it's split over two lines, your script seems to think it's fair game.
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
And of course, I'll need to change most of those entities into their character equivalents to suit my preferences.