Quote:
Originally Posted by kerrware
The only thing I can think of is to try and introduce some "remove_tags" type code to try and simplyfy the html so it can be converted. This could take some time (not that familiar with html or python code). Any suggestions as to what I can and can't remove?
|
Try this:
Code:
keep_only_tags = dict(name='div', attrs={'id':['ds-headline','viewarticle']})
You may find some other items you want to keep (use FireFox/FireBug to find them), but you're right, there's something in there that's messing up the conversion.