Well I have tryed the recipe and is working ok with all the sections. It is not perfect formatted and I don´t know what most of the sections does but is working, and even if it is a local newspaper I put ut here just in case Someone is interested.
It is normal that takes so long to get the recipe ?
It is there any way to get the recipe cooked in EPUB even if the default settings for calibre is LRF ??
Las section of def preprocess_html I don't know what is doing but I realized the many of the recipes have the section so I used it.
Even if remove_tags is empty I left it there just in case that in the future something not whised appears on the result.
Thanks
Recipe
Code:
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2009, Jos <nomedeslabrasa at gmail.com>'
'''
elmundo.es
'''
from calibre.web.feeds.news import BasicNewsRecipe
class FarodeVigo(BasicNewsRecipe):
title = 'Faro de Vigo'
__author__ = 'Jos'
description = 'Noticias de Vigo'
publisher = 'Faro de Vigo'
category = 'Noticias'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
encoding = 'latin1'
cover_url = 'http://www.farodevigo.es/elementosWeb/mediaweb/images/iconos/logo2.jpg'
remove_javascript = True
html2lrf_options = [
'--comment', description
, '--category', category
, '--publisher', publisher
]
html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'
keep_only_tags = [dict(name='div', attrs={'class' : 'noticia_titular'}),
dict(name='div', attrs={'class' : 'subtitulo'}),
dict(name='div', attrs={'class' : 'cuadro_multimedia'}),
dict(name='div', attrs={'id' : 'noticia_texto', 'class' : 'noticia_texto'})]
remove_tags = [
]
feeds = [ (u'Vigo', u'http://www.farodevigo.es/elementosInt/rss/1' )
,(u'Gran Vigo', u'http://www.farodevigo.es/elementosInt/rss/2' )
,(u'Al minuto', u'http://www.farodevigo.es/elementosInt/rss/AlMinuto' )
,(u'Galicia', u'http://www.farodevigo.es/elementosInt/rss/4' )
,(u'Comarcas', u'http://www.farodevigo.es/elementosInt/rss/3' )
,(u'Pontevedra', u'http://www.farodevigo.es/elementosInt/rss/15' )
,(u'Ourense', u'http://www.farodevigo.es/elementosInt/rss/16' )
,(u'Arosa', u'http://www.farodevigo.es/elementosInt/rss/17' )
,(u'Morrazo', u'http://www.farodevigo.es/elementosInt/rss/18' )
,(u'Deza-Tabeirós-Montes', u'http://www.farodevigo.es/elementosInt/rss/19' )
,(u'España', u'http://www.farodevigo.es/elementosInt/rss/6' )
,(u'Mundo', u'http://www.farodevigo.es/elementosInt/rss/7' )
,(u'Opinión', u'http://www.farodevigo.es/elementosInt/rss/5' )
,(u'Economía', u'http://www.farodevigo.es/elementosInt/rss/10' )
,(u'Sociedad y Cultura', u'http://www.farodevigo.es/elementosInt/rss/8' )
,(u'Sucesos', u'http://www.farodevigo.es/elementosInt/rss/9' )
,(u'Deportes', u'http://www.farodevigo.es/elementosInt/rss/11' )
,(u'Agenda', u'http://www.farodevigo.es/elementosInt/rss/21' )
,(u'Gente', u'http://www.farodevigo.es/elementosInt/rss/24' )
,(u'Televisión', u'http://www.farodevigo.es/elementosInt/rss/25' )
,(u'Ciencia y tecnología', u'http://www.farodevigo.es/elementosInt/rss/26' )
,(u'Humor', u'http://www.farodevigo.es/elementosInt/rss/12' )
,(u'Última', u'http://www.farodevigo.es/elementosInt/rss/13' )
,(u'Cartas', u'http://www.farodevigo.es/elementosInt/rss/20' )
]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
return soup
language = 'es'