View Single Post
Old 08-26-2010, 11:38 AM   #2530
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kerrware View Post
The only thing I can think of is to try and introduce some "remove_tags" type code to try and simplyfy the html so it can be converted. This could take some time (not that familiar with html or python code). Any suggestions as to what I can and can't remove?
Try this:
Code:
    keep_only_tags = dict(name='div', attrs={'id':['ds-headline','viewarticle']})
You may find some other items you want to keep (use FireFox/FireBug to find them), but you're right, there's something in there that's messing up the conversion.

Last edited by Starson17; 08-26-2010 at 11:57 AM.
Starson17 is offline