Thanks, Kovid, that's what I thought.
One more check before I surrender: I trimmed the HTML using the preprocess_regexps leaving only the content <div> section. Does this mean the faulty character is in this section? In other words, is there a way to strip the document from all the risky stuff before any XML parsing starts? I though preprocess_regexps should do the job but apparently it doesn't.
BTW, what are the kinds of faulty characters I should be looking for?
|