MobileRead Forums - View Single Post

Starson17 · 04-11-2011, 11:49 AM

Quote:

Originally Posted by Selcal

Thanks, that sounds like a good possibility. Can you give a quick example of how you use those statements? Online I can't find something that I can easily work into what I understand. Maybe you can give me a quick snip of the comic code where you use this?

Thanks for your help!

I went back and the code is not quite as I remember it. You can look at it yourself, it's in the gocomics.com builtin, but here's a relvant piece:

Spoiler:

Code:

    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        pages = range(1, self.num_comics_to_get+1)
        for page in pages:
            page_soup = self.index_to_soup(url)
            if page_soup:
                try:
                  strip_title = page_soup.find(name='div', attrs={'class':'top'}).h1.a.string
                except:
                  strip_title = 'Error - no Title found'
                try:
                  date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
                  if not date_title:
                      date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
                except:
                  date_title = 'Error - no Date found'
                title = strip_title + ' - ' + date_title
                for i in range(2):
                  try:
                    strip_url_date = page_soup.find(name='div', attrs={'class':'top'}).h1.a['href']
                    break #success - this is normal exit
                  except:
                    strip_url_date = None
                    continue #try to get strip_url_date again
                for i in range(2):
                  try:
                    prev_strip_url_date = page_soup.find('a', attrs={'class': 'prev'})['href']
                    break #success - this is normal exit
                  except:
                    prev_strip_url_date = None
                    continue #try to get prev_strip_url_date again
                if strip_url_date:
                  page_url = 'http://www.gocomics.com' + strip_url_date
                else:
                  continue
                if prev_strip_url_date:
                  prev_page_url = 'http://www.gocomics.com' + prev_strip_url_date
                else:
                  continue
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''})
            url = prev_page_url
        current_articles.reverse()
        return current_articles

I think you'll want to carefully look at your exact error. In my case, I had trouble understanding what was failing. I would get an error that an element on the page wasn't found, the recipe would bomb, then I'd print the soup, and I'd find that element. It seemed to be cured with the code above. In your case, you may need to do the page fetch multiple times. The code above particularly the "for i in range(2):" parts seem to have only fetched once and I vaguely recall puzzling why I couldn't find content that seemed to be there, so I added some retries of the href find. In your case, it should be possible to add multiple fetches if that's needed.