View Single Post
Old 07-24-2010, 10:36 PM   #2344
significance
Member
significance began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Oct 2009
Device: Kindle International
Smile Science Direct

Following up my own message, I now have the article titles highlighted appropriately, though I would still appreciate help in getting the versions of articles with full-sized images and getting rid of the left margin if possible. My code at this point:
Code:
import re
from calibre.web.feeds.news import BasicNewsRecipe

class ScienceDirect(BasicNewsRecipe):
    title          = u'Science Direct'
    __author__ = u'Barbara Robson'
    description = u'New journal articles from my favourite journals on Science Direct. Edit to choose your own favourites. Full text if you have an institutional login; abstracts otherwise.'
    oldest_article = 10
    max_articles_per_feed = 40
    no_stylesheets = True
    cover_url = 'http://rss.sciencedirect.com/images/logo_scid.gif'

    feeds          = [(u'Environmental Modelling and Software', u'http://rss.sciencedirect.com/publication/science/6063'),
                          (u'Ecological Modelling',u'http://rss.sciencedirect.com/publication/science/5934'),
                          (u'Estuarine, Coastal and Shelf Science',u'http://rss.sciencedirect.com/publication/science/6776'),
                          (u'Water Research',u'http://rss.sciencedirect.com/publication/science/5831')]
    
    def full_images(self, url):
          return url.append("&artImgPref=F")

    remove_tags_before = dict(id='articleContent')
    # highlight article title
    preprocess_regexps = [
        (re.compile(r'(<div.class="articleTitle">)([^<]+)(<)'),
         lambda m: '%s<h2 class="h2">%s</h2>%s' % (m.group(1), m.group(2), m.group(3)))
    ]
    
    remove_tags_after = [dict(attrs={'class':'SDTxtSmallBold'})]
    remove_tags = [dict(attrs={'class':'SDTxtSmallBold'})]

Last edited by significance; 07-25-2010 at 07:53 PM. Reason: Clarification
significance is offline