View Single Post
Old 06-06-2012, 01:57 PM   #2
atordo
Connoisseur
atordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to behold
 
Posts: 89
Karma: 19669
Join Date: Apr 2012
Device: Kindle Touch
Improved recipe with multipage support, better parsing and some CSS. Being a magazine and not a diary you should suit oldest_article to your downloading preferences (default two weeks).

Spoiler:
Code:
import re
from calibre.web.feeds.news import BasicNewsRecipe

class ViceESRecipe(BasicNewsRecipe):
    title = u'Vice Magazine España'
    description = u'La página web oficial de la revista Vice España'
    category = u'noticias, fotografía, blogs, moda, arte, cine, música, literatura, tecnología'
    cover_url = 'http://www.seeklogo.com/images/V/Vice-logo-668578AC94-seeklogo.com.gif'
    oldest_article = 14
    max_articles_per_feed = 100
    auto_cleanup = False
    no_stylesheets = True
    language = 'es'
    use_embedded_content  = False
    remove_javascript = True
    publication_type = 'magazine'

    recursions=10
    match_regexps = [r'/read/.*\?Contentpage=[2-9]$']

    keep_only_tags = [
        dict(attrs={'class':['article_title','article_content','next']})
    ]
    remove_tags = [
        dict(attrs={'class':['social_buttons','search','tweet','like','inline_socials'
           ,'stumblebadge','plusone']})
    ]

    extra_css = '''
        .author{font-size:small}
        img{margin-bottom: 0.4em; display:block; margin-left:auto; margin-right: auto}
    '''

    preprocess_regexps = [
        (re.compile(r'<img src="http://.*\.scorecardresearch\.com/'), lambda m: '')
    ]

    feeds = [('Vice', 'http://www.vice.com/es/rss')]

Last edited by atordo; 06-07-2012 at 12:25 PM.
atordo is offline   Reply With Quote