View Single Post
Old 11-21-2014, 05:29 AM   #3517
cryzed
Evangelist
cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.
 
cryzed's Avatar
 
Posts: 408
Karma: 1050547
Join Date: Mar 2011
Device: Kindle Oasis 2
Did you try explicitly specifying the parser for the BeautifulSoup instance?:
Code:
BeautifulSoup(markup, 'html5lib')
And if I remember correctly, the error occured in the BaseAdapter.utf8FromSoup method. Is the BeautifulSoup instance that is passed to it really a BeautifulSoup 3 or BeautifulSoup 4 instance? It should be entirely dependent on the site adapter calling it.

If all this seems correct, the only thing I can think of is narrowing it down to the element that causes the error and extracting it (possibly via the soup instance if that doesn't already cause an error) before trying to turn the soup into a string, but I think you already tried something like that.

If all this doesn't help I'm a bit stumped, since the html5lib library is supposed to act exactly like a real browser when parsing HTML. I checked the code and there doesn't seem to be anything to indicate that the BeautifulSoup instance is modified improperly (which can easily lead to such errors), is it possibly that the raw HTML modifications the adapter does shortly beforehand at places are at fault?

Last edited by cryzed; 11-21-2014 at 05:34 AM.
cryzed is offline