Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-19-2011, 06:32 PM   #1
spedinfargo
Groupie
spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.
 
Posts: 155
Karma: 106422
Join Date: Nov 2010
Device: none
New: Roger Ebert Journal

A second Roger Ebert recipe - this a pull of his journal/blog. It is one of the best film blog you'll find on the web (one of the best blogs period) and it has great running conversations - some of the smartest most well-written comments you'll see. It generates a rather large file...

[Updated 2001-11-25 due to a minor change on the web page. Note - this is different than the other Roger Ebert feed I created. This one is his blog]

Last edited by spedinfargo; 11-25-2011 at 02:50 PM.
spedinfargo is offline   Reply With Quote
Old 02-19-2011, 06:33 PM   #2
spedinfargo
Groupie
spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.spedinfargo is the king of the Divan.
 
Posts: 155
Karma: 106422
Join Date: Nov 2010
Device: none
Code:

import re
import urllib2
import time
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, SoupStrainer

class EbertJournal(BasicNewsRecipe):
    title                 = 'Roger Ebert Journal'
    __author__            = 'Shane Erstad'
    description           = 'Roger Ebert Journal'
    publisher             = 'Chicago Sun Times'
    category              = 'movies'
    oldest_article        = 8
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'ISO-8859-1'
    masthead_url          = 'http://rogerebert.suntimes.com/graphics/global/roger.jpg'
    language              = 'en'
    remove_empty_feeds    = False
    PREFIX                  = 'http://blogs.suntimes.com/ebert'
    
    remove_tags_before = dict(id='content')
    remove_tags_after = dict(id='comments-open')

    



    extra_css             = """
                                @font-face {font-family: "sans1";src:url(res:///opt/sony/ebook/FONT/tt0003m_.ttf)}
                                .article_description,body{font-family: Arial,Helvetica,sans1,sans-serif}
                                .color-2{display:block; margin-bottom: 10px; padding: 5px, 10px;
                                border-left: 1px solid #D00000; color: #D00000}
                                img{margin-bottom: 0.8em} """


    conversion_options = {
                          'comment'          : description
                        , 'tags'             : category
                        , 'publisher'        : publisher
                        , 'language'         : language
                        , 'linearize_tables' : True
                        }


    feeds          = [
                        (u'Roger Ebert Journal'   , u'http://blogs.suntimes.com/ebert/' )
                     ]

    preprocess_regexps = [

        (re.compile(r'<span class="vcard author">Roger Ebert</span>', re.DOTALL|re.IGNORECASE),
            lambda m: 'Roger Ebert'),
        
        (re.compile(r'<span class="vcard author">', re.DOTALL|re.IGNORECASE),
            lambda m: '<hr width="80%"><span class="vcard author">'),
        
        (re.compile(r'<blockquote>', re.DOTALL|re.IGNORECASE),
            lambda m: ''),

        (re.compile(r'<a class="a2a_dd".*?</a>', re.DOTALL|re.IGNORECASE),
            lambda m: ''),

        (re.compile(r'<h2 class="comments-open-header">Leave a comment</h2>', re.DOTALL|re.IGNORECASE),
            lambda m: ''),

        (re.compile(r'a title="Reply".*?</a>', re.DOTALL|re.IGNORECASE),
            lambda m: '')
    ]
    

    def parse_index(self):

        totalfeeds = []
        lfeeds = self.get_feeds()
        for feedobj in lfeeds:
            feedtitle, feedurl = feedobj
            self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
            articles = []
            soup = self.index_to_soup(feedurl)
            for item in soup.findAll(attrs={'class':['entry-asset asset hentry']}):

                clearimages = item.find(attrs={'class':['mt-image-left']}).replaceWith("")
                bodysection = item.find(attrs={'class':['asset-body']})
                datesection = item.find(attrs={'class':['published']})
                titlesection = item.find(attrs={'class':['asset-name entry-title']})


                self.log(bodysection)

                link = titlesection.find('a')
                url         = link['href']
                title       = self.tag_to_string(link)
                self.log(url)
                self.log(title)
                articles.append({
                                      'title'      :title
                                     ,'date'       :' [' + self.tag_to_string(datesection) + ']'
                                     ,'url'        :url
                                     ,'description':self.tag_to_string(bodysection)
                                    })
            totalfeeds.append((feedtitle, articles))
        return totalfeeds

Last edited by spedinfargo; 11-25-2011 at 02:51 PM.
spedinfargo is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
For Testing: Roger Ebert (movie reviews) Recipe spedinfargo Recipes 5 02-19-2011 09:32 PM
Free ebook today only: Roger Ebert, Awake in the Dark soondai Deals and Resources (No Self-Promotion or Affiliate Links) 7 10-01-2010 06:43 AM
Content Help Get Roger Zelazny on Kindle NogDog Amazon Kindle 0 08-16-2010 11:47 PM
Roger Zelazny corroonb Reading Recommendations 9 03-13-2009 04:08 PM


All times are GMT -4. The time now is 02:57 AM.


MobileRead.com is a privately owned, operated and funded community.