Quote:
Originally Posted by ilovejedd
Just a comment. If you're using Tidy, it seems to work much better if you save as HTML (in its whole tag-soup mess glory) and not as filtered HTML. I was doing something similar last night on OCR'ed text (using the built-in function in Notepad++) and I noticed Tidy was able to clean the "messy" HTML much better as opposed to the filtered HTML.
|
Thanks, I'll give that a try and see if the HTML winds up being cleaner... I don't think it'll solve this problem though...