Here is one method: ("index" is just the URL to a page that has the cover in an img tag in a span tag of class "cover" where the src of the img tag is the URL to the cover)
Code:
def get_cover_url(self):
cover_url = None
soup = self.index_to_soup(self.index)
cover_item = soup.find('span', attrs={'class':'cover'})
if cover_item:
cover_url = cover_item.img['src']
return cover_url
is there anyway to do this more simply? i was reading into datetime, but couldn't figure out how to use it aside from adding the line "import datetime" into the top of the script.
an example cover page url is the following:
http://assets.nydailynews.com/img/20...tpage_0814.jpg
it contains the year "2010", month "08", and day "14".
the image url was obtained from the daily news cover page archive
herehttp://www.nydailynews.com/news/galleries/august_2010_daily_news_front_pages/august_2010_daily_news_front_pages.html