Quote:
Originally Posted by cryzed
Did you try explicitly specifying the parser for the BeautifulSoup instance?:
|
Yep.
Quote:
Originally Posted by cryzed
And if I remember correctly, the error occured in the BaseAdapter.utf8FromSoup method. Is the BeautifulSoup instance that is passed to it really a BeautifulSoup 3 or BeautifulSoup 4 instance?
|
Yeah, I modified the adapter to use bs4 and BaseAdapter.utf8FromSoup to accept either.
The error is coming from the utf8FromSoup code that does a findAll on all tags to strip off extra attributes. If I bypass that it works--so a more forgiving method of spinning through the tags may work. The improperly nested tags cause confusion.
Quote:
Originally Posted by cryzed
... is it possibly that the raw HTML modifications the adapter does shortly beforehand at places are at fault?
|
That's a good question--I hadn't checked that. But no, skipping those didn't help.
BTW, I did consult your code from the package-magic branch and I'm using part of it, thanks for that.