MobileRead Forums - View Single Post - Custom recipes (archive, read-only)

fortunados · 11-27-2009, 02:26 AM

Well I have tryed the recipe and is working ok with all the sections. It is not perfect formatted and I don´t know what most of the sections does but is working, and even if it is a local newspaper I put ut here just in case Someone is interested.

It is normal that takes so long to get the recipe ?

It is there any way to get the recipe cooked in EPUB even if the default settings for calibre is LRF ??

Las section of def preprocess_html I don't know what is doing but I realized the many of the recipes have the section so I used it.

Even if remove_tags is empty I left it there just in case that in the future something not whised appears on the result.

Thanks

Recipe

Code:

#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2009, Jos <nomedeslabrasa at gmail.com>'
'''
elmundo.es
'''

from calibre.web.feeds.news import BasicNewsRecipe

class FarodeVigo(BasicNewsRecipe):
    title                 = 'Faro de Vigo'
    __author__            = 'Jos'
    description           = 'Noticias de Vigo'
    publisher             = 'Faro de Vigo'
    category              = 'Noticias'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'latin1'
    cover_url             = 'http://www.farodevigo.es/elementosWeb/mediaweb/images/iconos/logo2.jpg'
    remove_javascript     = True

    html2lrf_options = [
                          '--comment', description
                        , '--category', category
                        , '--publisher', publisher
                        ]

    html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'


    keep_only_tags = [dict(name='div', attrs={'class' : 'noticia_titular'}),
                              dict(name='div', attrs={'class' : 'subtitulo'}),
                              dict(name='div', attrs={'class' : 'cuadro_multimedia'}),
                              dict(name='div', attrs={'id' : 'noticia_texto', 'class' : 'noticia_texto'})]
    remove_tags = [
                  ]

    feeds = [ (u'Vigo', u'http://www.farodevigo.es/elementosInt/rss/1'					)
				,(u'Gran Vigo', u'http://www.farodevigo.es/elementosInt/rss/2'				)
				,(u'Al minuto', u'http://www.farodevigo.es/elementosInt/rss/AlMinuto'		)
				,(u'Galicia', u'http://www.farodevigo.es/elementosInt/rss/4'				)
				,(u'Comarcas', u'http://www.farodevigo.es/elementosInt/rss/3'				)
				,(u'Pontevedra', u'http://www.farodevigo.es/elementosInt/rss/15'			)
				,(u'Ourense', u'http://www.farodevigo.es/elementosInt/rss/16'				)
				,(u'Arosa', u'http://www.farodevigo.es/elementosInt/rss/17'				)
				,(u'Morrazo', u'http://www.farodevigo.es/elementosInt/rss/18'				)
				,(u'Deza-Tabeirós-Montes', u'http://www.farodevigo.es/elementosInt/rss/19'	)
				,(u'España', u'http://www.farodevigo.es/elementosInt/rss/6'				)
				,(u'Mundo', u'http://www.farodevigo.es/elementosInt/rss/7'					)
				,(u'Opinión', u'http://www.farodevigo.es/elementosInt/rss/5'				)
				,(u'Economía', u'http://www.farodevigo.es/elementosInt/rss/10'				)
				,(u'Sociedad y Cultura', u'http://www.farodevigo.es/elementosInt/rss/8'	)
				,(u'Sucesos', u'http://www.farodevigo.es/elementosInt/rss/9'				)
				,(u'Deportes', u'http://www.farodevigo.es/elementosInt/rss/11'				)
				,(u'Agenda', u'http://www.farodevigo.es/elementosInt/rss/21'				)
				,(u'Gente', u'http://www.farodevigo.es/elementosInt/rss/24'				)
				,(u'Televisión', u'http://www.farodevigo.es/elementosInt/rss/25'			)
				,(u'Ciencia y tecnología', u'http://www.farodevigo.es/elementosInt/rss/26'	)
				,(u'Humor', u'http://www.farodevigo.es/elementosInt/rss/12'				)
				,(u'Última', u'http://www.farodevigo.es/elementosInt/rss/13'				)
				,(u'Cartas', u'http://www.farodevigo.es/elementosInt/rss/20'				)
            ]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        return soup

    language = 'es'