Starson17,
I went back to the Gocomic recipe and tried to follow what you were doing and using what you stated about printing the title, url and so forth. The code I have currently it gets the soup as indicated in the output.txt file but then it craps out saying the index is out of range. I thought that was why you put number of pages to get in a range field. I set mine to 7 as you can see in my code but again I get index out or range....

I feel like the little Engine that Could or better yet the Ant at the Rubber Tree Plant. I got high hopes haha..
Spoiler:
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class FIELDSTREAM(BasicNewsRecipe):
title = 'FIELD AND STREAM BLOGS'
__author__ = 'Tony Stegall'
description = 'Hunting and Fishing and Gun Talk'
INDEX = 'http://www.fieldandstream.com/blogs'
language = 'en'
#------------------------------------------------------
#variables
num_pages_to_get = 7
#-------------------------------------------------------
no_stylesheets = True
def parse_index(self):
feeds = []
for title, url in [
(u"Wild Chef", u"http://www.fieldandstream.com/blogs/wild-chef"),
]:
articles = self.make_links(url)
if articles:
feeds.append((title, articles))
return feeds
def make_links(self, url):
title = 'Temp'
current_articles = []
page_soup = self.index_to_soup(url)
print 'The soup is: ', page_soup
pages = range(1, self.num_pages_to_get+1) # put this in to start with the first page and then go up to 7 increment by 1
for page in pages:
if page_soup:
try:
strip_title = page_soup.h2.a.string # try to strip the string(text) from the h2 tag
except:
strip_title = 'Error - no page_soup.h2.a.string' # throw an error if it can't find it
try:
date_title = page_soup.find('ul', attrs={'class': 'first even'}).li.string #get the date from the li tag text
except:
date_title = 'Error - no page_soup.h2.li.string'
title = strip_title + ' - ' + date_title #piece the title together here
try:
url = page_soup.h2.a['href'] #try to get the url from the h2 tags <a>
break
except:
continue
continue
print 'the title is: ', title
print 'the page_url is: ', page_url
current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''}) # append all this
return current_articles
This is like playing battleship, I'm firing and firing and I get close but not getting a direct hit.