View Single Post
Old 07-29-2011, 03:30 AM   #10
jmaciek
Junior Member
jmaciek began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jul 2011
Location: Warsaw, Poland
Device: Kindle 3
Thanks, Kovid, that's what I thought.

One more check before I surrender: I trimmed the HTML using the preprocess_regexps leaving only the content <div> section. Does this mean the faulty character is in this section? In other words, is there a way to strip the document from all the risky stuff before any XML parsing starts? I though preprocess_regexps should do the job but apparently it doesn't.

BTW, what are the kinds of faulty characters I should be looking for?
jmaciek is offline   Reply With Quote