Quote:
Originally Posted by kyzcreig
Well I'll be. This did the trick. Now the main obstacle is how I parse this jumble of HTML. It seems like some tags are cut off so I'll need a solution for that as well. So far this is turning out very nicely though!
|
If you're after the rendered text, you'll probably need some sort of parser that can handle malformed html. You'll also need to determine the characted encoding (usually utf8, but quite often cp1252 as well). Your best bet is to convert any html entities to their character equivalents and then parse to get the text.