View Single Post
Old 08-29-2010, 10:51 PM   #2561
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
hmm

Alright I looked at some samples and I also seen what you had done. I went the second method that you mentioned though about making my own links. Well, I thought I was obviously not working. Here is what I am up with. if you have the time could you look at this and kinda shed some more light on me. Thanks.
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class FIELDSTREAM(BasicNewsRecipe):

    title      = 'FIELD AND STREAM BLOGS'
    __author__ = 'Tony Stegall'
    description = 'Hunting and Fishing and Gun Talk'
    INDEX = 'http://www.fieldandstream.com/blogs'
    language = 'en'

    
    no_stylesheets = True

   


   

    def parse_index(self):
       
        soup = self.index_to_soup(url)
        feeds =[]
        #array to hold the feeds
        for mainsec in soup.findAll('div',  attrs={'class':'item-list'}):
            #above findall instances where the div tag has the attribute of item-list
            section_title ='Wild Chef'
            #hard code the section title to be appended to the feed
            articles = []
            #array to hold the article content
            
            
            #-----------------------------------------------------------------------
            #trying to find all the h2 tags and parse the <a> for the title
            #not really understanding how this is done though
            #=-----------------------------------------------------------------------
            h = feedhead.find(['h2'])
            #find the h2 tag that has the title embedded inside it with an anchor tag
            
            a = mainsec.find('a', href=True)
            
            title = self.tag_to_string(a)
            
            myurl = a['href']
            if myurl.startswith('/'):
               myurl = 'http://www.fieldandstream.com' + url
               
            #--end of parse for title-----------------------------------------------------
            
            #-----------------------------------------------------------------------------------------------------------
            #face the same problem with the p tags.  I have a <p> tag then a <em> then in some cases another <p>
            #I want to get the content of the <p> within the <p> but not sure how :( 
            #example: 
            #   <p>
            #      <p> some blah blah blah </p>
            #   so basically all i want is all the text within the <div class=teaser> but not sure how :(
            for teaser in mainsec.findall('div',  attrs={'class':'teaser'}):
                p = post.find('p')
                desc = None
                if p is not None:
                    desc = self.tag_to_string(p)
            
                articles.append({'title':title, 'url':myurl, 'description':desc,
                    'date':''}) 
            #--------------------end of description parse from teaser-----------------------------------------------
            
             
            feeds.append((section_title, articles))  
            #put all articles for the section inside the feeds 
            
            return feeds
TonytheBookworm is offline