So, I was looking to build a recipe for NY Magazine - I've tried building my own, but my python/beautiful soup skills are not so great...
the toc is here:
http://nymag.com/includes/tableofcontents.htm but it gets redirected to a different page each week with the format:
http://nymag.com/nymag/toc/YYYYMMDD/ (where DD is the next monday...)
I've managed to strip the page down to what I need, but I don't understand how to fetch the articles, etc...
Basically:
Code:
remove_tags_before = dict(id='magazine-toc')
remove_tags_after = dict(attrs={'class':['attention']})
remove_tags = [dict(attrs={'class':['cover']}),
dict(name=['h2'])]
It leave with a page where the structure of an article is:
Code:
<h5><a href="link_to_article">article title</a></h5>
<p>article blurb</p>
(but no enclosing div or anything) - So I'm unsure how to link the article title to the key, link, etc
... I then want to replace the article url to go to the print version, which is basically:
Code:
http://www.printthis.clickability.com/pt/cpt?action=cpt&title=ARTICLE-TITLE&expire=&urlID=STRANGE-NUMBER&fb=Y&url=ARTICLE-URL
I can't figure out where the STRANGE-NUMBER is coming from in the article page either....
Hope this makes sense - Thanks for your help