Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-19-2011, 06:22 AM   #1
fms
Junior Member
fms began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Dec 2006
FIX: La Vanguardia Recipe

A fix to La Vanguardia an spanish newspaper.
http://www.lavanguardia.es/

Code:
#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
'''
www.lavanguardia.es
'''

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag

class LaVanguardia(BasicNewsRecipe):
    title                 = 'La Vanguardia Digital'
    __author__            = 'Darko Miletic'
    description           = u'Noticias desde España'
    publisher             = 'La Vanguardia'
    category              = 'news, politics, Spain'
    oldest_article        = 2
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    delay                 = 5
 #   encoding              = 'cp1252'
    language = 'es'

    direction             = 'ltr'

    html2lrf_options = [
                          '--comment'  , description
                        , '--category' , category
                        , '--publisher', publisher
                        ]

    html2epub_options  = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'

    feeds              = [
                            (u'Portada'           , u'http://feeds.feedburner.com/lavanguardia/home'   )
                          ,(u'Cultura'              , u'http://feeds.feedburner.com/lavanguardia/cultura'      )
                           ,(u'Deportes'             , u'http://feeds.feedburner.com/lavanguardia/deportes'     )
                           ,(u'Economia'             , u'http://feeds.feedburner.com/lavanguardia/economia'     )
                           ,(u'El lector opina'      , u'http://feeds.feedburner.com/lavanguardia/lectoropina'  )
                           ,(u'Gente y TV'           , u'http://feeds.feedburner.com/lavanguardia/gente'        )
                           ,(u'Internacional'        , u'http://feeds.feedburner.com/lavanguardia/internacional')
                           ,(u'Internet y tecnologia', u'http://feeds.feedburner.com/lavanguardia/internet'     )
                           ,(u'Motor'                , u'http://feeds.feedburner.com/lavanguardia/motor'        )
                           ,(u'Politica'             , u'http://feeds.feedburner.com/lavanguardia/politica'     )
                           ,(u'Sucesos'             , u'http://feeds.feedburner.com/lavanguardia/sucesos'      )
                         ]


    keep_only_tags = [
                       dict(name='div', attrs={'class':'detalle  noticia'})
                    ]

    remove_tags        = [
                             dict(name=['object','link','script'])
                            ,dict(name='div', attrs={'class':['colC','peu','jstoolbar']})
                         ]

    remove_tags_after = [dict(name='div', attrs={'class':'text'})]

    def preprocess_html(self, soup):
        soup.html['dir' ] = self.direction
        mcharset = Tag(soup,'meta',[("http-equiv","Content-Type"),("content","text/html; charset=utf-8")])
        soup.head.insert(0,mcharset)
        for item in soup.findAll(style=True):
            del item['style']
        return soup
fms is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 04:57 AM
FIX: New York Times Recipe bcollier Recipes 2 08-25-2011 11:31 AM
[Old Thread] epub-fix chicagofilms Calibre 15 05-07-2011 05:58 PM
PRS-950 They can't fix it so I can't keep it JakesFriend Sony Reader 43 03-03-2011 10:03 PM
Classic Looking for the quick fix jrh Barnes & Noble NOOK 2 10-03-2010 06:19 PM


All times are GMT -4. The time now is 08:20 AM.


MobileRead.com is a privately owned, operated and funded community.