View Single Post
Old 08-25-2010, 01:52 PM   #2521
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by naisren View Post
Thanks for your help and sorry for my confusing expression.
That's OK. I looked at your site and ran your recipe (thank you for using code tags - you may also want to add spoiler tags to reduce the length).

I now understand your problem. The site has bad html. The page you are trying to parse to get feeds is seen as a giant NavigableString inside a single tag. There are no other tags within it as far as BeautifulSoup is concerned. I don't know exactly why, but I suspect it isn't solely due to the fact that it is using the " />" format to immediately close div tags, then trying to close them again with the normal </div>, so there are two closings (another bit of bad html.)

Whatever is going on is confusing Beautiful Soup to the extent that it can't find anything except the first surrounding tag. It should still be possible to extract feeds, but it will require much trickier programming to get links out of the giant string which is soup.contents[0].string. You will need to treat it as a string, then extract from the string, instead of trying to find tags within that string (although you may be able to use BS to convert it into a tag-based structure with some trickery).

It's an interesting problem, and I regret that I don't have time now to attack it. If you solve it, post your solution.
Starson17 is offline