Quote:
Originally Posted by Selcal
Thanks, that sounds like a good possibility. Can you give a quick example of how you use those statements? Online I can't find something that I can easily work into what I understand. Maybe you can give me a quick snip of the comic code where you use this?
Thanks for your help!
|
I went back and the code is not quite as I remember it. You can look at it yourself, it's in the gocomics.com builtin, but here's a relvant piece:
Spoiler:
Code:
def make_links(self, url):
title = 'Temp'
current_articles = []
pages = range(1, self.num_comics_to_get+1)
for page in pages:
page_soup = self.index_to_soup(url)
if page_soup:
try:
strip_title = page_soup.find(name='div', attrs={'class':'top'}).h1.a.string
except:
strip_title = 'Error - no Title found'
try:
date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
if not date_title:
date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
except:
date_title = 'Error - no Date found'
title = strip_title + ' - ' + date_title
for i in range(2):
try:
strip_url_date = page_soup.find(name='div', attrs={'class':'top'}).h1.a['href']
break #success - this is normal exit
except:
strip_url_date = None
continue #try to get strip_url_date again
for i in range(2):
try:
prev_strip_url_date = page_soup.find('a', attrs={'class': 'prev'})['href']
break #success - this is normal exit
except:
prev_strip_url_date = None
continue #try to get prev_strip_url_date again
if strip_url_date:
page_url = 'http://www.gocomics.com' + strip_url_date
else:
continue
if prev_strip_url_date:
prev_page_url = 'http://www.gocomics.com' + prev_strip_url_date
else:
continue
current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''})
url = prev_page_url
current_articles.reverse()
return current_articles
I think you'll want to carefully look at your exact error. In my case, I had trouble understanding what was failing. I would get an error that an element on the page wasn't found, the recipe would bomb, then I'd print the soup, and I'd find that element. It seemed to be cured with the code above. In your case, you may need to do the page fetch multiple times. The code above particularly the "for i in range(2):" parts seem to have only fetched once and I vaguely recall puzzling why I couldn't find content that seemed to be there, so I added some retries of the href find. In your case, it should be possible to add multiple fetches if that's needed.