View Single Post
Old 04-08-2009, 04:48 PM   #430
motdiem
Junior Member
motdiem began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2009
Device: PRS-505
NYMag.com recipe help

So, I was looking to build a recipe for NY Magazine - I've tried building my own, but my python/beautiful soup skills are not so great...

the toc is here: http://nymag.com/includes/tableofcontents.htm but it gets redirected to a different page each week with the format: http://nymag.com/nymag/toc/YYYYMMDD/ (where DD is the next monday...)

I've managed to strip the page down to what I need, but I don't understand how to fetch the articles, etc...

Basically:
Code:
remove_tags_before = dict(id='magazine-toc')
remove_tags_after  = dict(attrs={'class':['attention']})
remove_tags = [dict(attrs={'class':['cover']}),
                dict(name=['h2'])]
It leave with a page where the structure of an article is:
Code:
<h5><a href="link_to_article">article title</a></h5>
<p>article blurb</p>
(but no enclosing div or anything) - So I'm unsure how to link the article title to the key, link, etc

... I then want to replace the article url to go to the print version, which is basically:
Code:
http://www.printthis.clickability.com/pt/cpt?action=cpt&title=ARTICLE-TITLE&expire=&urlID=STRANGE-NUMBER&fb=Y&url=ARTICLE-URL
I can't figure out where the STRANGE-NUMBER is coming from in the article page either....


Hope this makes sense - Thanks for your help

Last edited by motdiem; 04-08-2009 at 04:56 PM.
motdiem is offline