3 recipes in spanish - MobileRead Forums

12-26-2010, 03:21 PM	#1
desUBIKado Member Posts: 22 Karma: 12 Join Date: Feb 2009 Location: Zaragoza, Spain Device: prs-505, iliad	3 recipes in spanish Hi there, This is my first contribution for the Calibre communitiy, 3 recipes in spanish. Two are newspapers and the other is a entertainment site. All of them bring news from Aragon, a autonomous community in Spain. Calibre already have a recipe for the newspaper "Heraldo de Aragon", but not working properly. Mine does work well Regards, desUBIKado 1. heraldo.es Spoiler: #!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '04 December 2010, desUBIKado' __author__ = 'desUBIKado' __description__ = 'Daily newspaper from Aragon' __version__ = 'v0.03' __date__ = '11, December 2010' ''' http://www.heraldo.es/ ''' import time from calibre.web.feeds.news import BasicNewsRecipe class heraldo(BasicNewsRecipe): author = 'desUBIKado' description = 'Daily newspaper from Aragon' title = u'Heraldo de Aragon' publisher = 'OJD Nielsen' category = 'News, politics, culture, economy, general interest' language = 'es' timefmt = '[%a, %d %b, %Y]' oldest_article = 1 max_articles_per_feed = 100 use_embedded_content = False remove_javascript = True no_stylesheets = True recursion = 10 feeds = [ (u'Portadas', u'http://www.heraldo.es/index.php/mod.portadas/mem.rss') ] keep_only_tags = [dict(name='div', attrs={'id':['dts','com']})] remove_tags = [dict(name='a', attrs={'class':['com flo-r','enl-if','enl-df']}), dict(name='div', attrs={'class':['brb-b-s con marg-btt','cnt-rel con']}), dict(name='form', attrs={'class':'form'})] remove_tags_before = dict(name='div' , attrs={'id':'dts'}) remove_tags_after = dict(name='div' , attrs={'id':'com'}) def get_cover_url(self): cover = None st = time.localtime() year = str(st.tm_year) month = "%.2d" % st.tm_mon day = "%.2d" % st.tm_mday #http://oldorigin-www.heraldo.es/2010...ada_aragon.pdf cover='http://oldorigin-www.heraldo.es/'+ year + month + day +'/primeras/portada_aragon.pdf' br = BasicNewsRecipe.get_browser() try: br.open(cover) except: self.log("\nPortada no disponible") cover ='http://www.heraldo.es/MODULOS/global/publico/interfaces/img/logo-Heraldo.png' return cover extra_css = ''' h2{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:xx-large;} ''' 2. elperiodicodearagon.com Spoiler: #!/usr/bin/env python # -- coding: utf-8 -- __license__ = 'GPL v3' __copyright__ = '04 December 2010, desUBIKado' __author__ = 'desUBIKado' __description__ = 'Daily newspaper from Aragon' __version__ = 'v0.05' __date__ = '07, December 2010' ''' elperiodicodearagon.com ''' import re from calibre.web.feeds.news import BasicNewsRecipe class elperiodicodearagon(BasicNewsRecipe): title = u'El Periodico de Aragon' author = u'desUBIKado' description = u'Noticias desde Aragon' publisher = u'elperiodicodearagon.com' category = u'news, politics, Spain, Aragon' oldest_article = 2 delay = 0 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False language = 'es' encoding = 'utf8' remove_empty_feeds = True remove_javascript = True conversion_options = { 'comments' : description ,'tags' : category ,'language' : language ,'publisher' : publisher } feeds = [(u'Arag\xf3n', u'http://elperiodicodearagon.com/RSS/2.xml'), (u'Internacional', u'http://elperiodicodearagon.com/RSS/4.xml'), (u'Espa\xf1a', u'http://elperiodicodearagon.com/RSS/3.xml'), (u'Econom\xeda', u'http://elperiodicodearagon.com/RSS/5.xml'), (u'Deportes', u'http://elperiodicodearagon.com/RSS/7.xml'), (u'Real Zaragoza', u'http://elperiodicodearagon.com/RSS/10.xml'), (u'Opini\xf3n', u'http://elperiodicodearagon.com/RSS/103.xml'), (u'Escenarios', u'http://elperiodicodearagon.com/RSS/105.xml'), (u'Sociedad', u'http://elperiodicodearagon.com/RSS/104.xml'), (u'Gente', u'http://elperiodicodearagon.com/RSS/330.xml')] extra_css = ''' h3{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:xx-large;} h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;} dd{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;} ''' remove_attributes = ['height','width'] keep_only_tags = [dict(name='div', attrs={'id':'contenidos'})] # Quitar toda la morralla remove_tags = [dict(name='ul', attrs={'class':'herramientasDeNoticia'}), dict(name='span', attrs={'class':'MasInformacion '}), dict(name='span', attrs={'class':'MasInformacion'}), dict(name='div', attrs={'class':'Middle'}), dict(name='div', attrs={'class':'MenuCabeceraRZaragoza'}), dict(name='div', attrs={'id':'MenuCabeceraRZaragoza'}), dict(name='div', attrs={'class':'MenuEquipo'}), dict(name='div', attrs={'class':'TemasRelacionados'}), dict(name='div', attrs={'class':'GaleriaEnNoticia'}), dict(name='div', attrs={'class':'Recorte'}), dict(name='div', attrs={'id':'NoticiasenRecursos'}), dict(name='div', attrs={'id':'NoticiaEnPapel'}), dict(name='p', attrs={'class':'RecorteEnNoticias'}), dict(name='div', attrs={'id':'Comparte'}), dict(name='div', attrs={'id':'CajaComparte'}), dict(name='a', attrs={'class':'EscribirComentario'}), dict(name='a', attrs={'class':'AvisoComentario'}), dict(name='div', attrs={'class':'CajaAvisoComentario'}), dict(name='div', attrs={'class':'navegaNoticias'}), dict(name='div', attrs={'id':'PaginadorDiCom'}), dict(name='div', attrs={'id':'CajaAccesoCuentaUsuario'}), dict(name='div', attrs={'id':'CintilloComentario'}), dict(name='div', attrs={'id':'EscribeComentario'}), dict(name='div', attrs={'id':'FormularioComentario'}), dict(name='div', attrs={'id':'FormularioNormas'})] # Recuperamos la portada de papel (la imagen format=1 tiene mayor resolucion) def get_cover_url(self): index = 'http://pdf.elperiodicodearagon.com/' soup = self.index_to_soup(index) for image in soup.findAll('img',src=True): if image['src'].startswith('http://pdf.elperiodicodearagon.com/funciones/portada-preview.php?eid='): return image['src'].rstrip('format=2') + 'format=1' return None # Para quitar espacios entre la noticia y los comentarios (lineas 1 y 2) # El indice no apuntaba correctamente al empiece de la noticia (linea 3) preprocess_regexps = [ (re.compile(r'<p> </p>', re.DOTALL\|re.IGNORECASE), lambda match: ''), (re.compile(r'<p> </p>', re.DOTALL\|re.IGNORECASE), lambda match: ''), (re.compile(r'<p id="">', re.DOTALL\|re.IGNORECASE), lambda match: '<p>') ] 3. redaragon.com Spoiler: #!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '11 December 2010, desUBIKado' __author__ = 'desUBIKado' __description__ = 'Entertainment guide from Aragon' __version__ = 'v0.01' __date__ = '11, December 2010' ''' http://www.redaragon.es/ ''' from calibre.web.feeds.news import BasicNewsRecipe class heraldo(BasicNewsRecipe): author = 'desUBIKado' description = u'Guia de ocio desde Aragon' title = u'RedAragon' publisher = 'Grupo Z' category = 'Concerts, Movies, Entertainment news' cover_url = 'http://www.redaragon.com/2008_img/logotipo.gif' language = 'es' timefmt = '[%a, %d %b, %Y]' oldest_article = 15 max_articles_per_feed = 100 encoding = 'iso-8859-1' use_embedded_content = False remove_javascript = True no_stylesheets = True feeds = [(u'Conciertos', u'http://redaragon.com/rss/agenda.asp?tid=1'), (u'Exposiciones', u'http://redaragon.com/rss/agenda.asp?tid=5'), (u'Teatro', u'http://redaragon.com/rss/agenda.asp?tid=10'), (u'Conferencias', u'http://redaragon.com/rss/agenda.asp?tid=2'), (u'Ferias', u'http://redaragon.com/rss/agenda.asp?tid=6'), (u'Filmotecas/Cineclubs', u'http://redaragon.com/rss/agenda.asp?tid=7'), (u'Presentaciones', u'http://redaragon.com/rss/agenda.asp?tid=9'), (u'Fiestas', u'http://redaragon.com/rss/agenda.asp?tid=11'), (u'Infantil', u'http://redaragon.com/rss/agenda.asp?tid=13'), (u'Otros', u'http://redaragon.com/rss/agenda.asp?tid=8')] keep_only_tags = [dict(name='div', attrs={'id':'FichaEventoAgenda'})] remove_tags = [dict(name='div', attrs={'class':['Comparte','CajaAgenda','Caja','Cintillo']})] remove_tags_before = dict(name='div' , attrs={'id':'FichaEventoAgenda'}) remove_tags_after = dict(name='div' , attrs={'class':'Cintillo'})

12-27-2010, 12:20 PM	#2
kovidgoyal creator of calibre Posts: 43,860 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	thanks, will be in the next calibre release.

Advert

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
2 recipes	derekliang	Recipes	1	12-20-2010 11:44 PM
Free (Kindle) Old Havana Cookbook: Cuban Recipes in Spanish and English (Bilingual Co	arcadata	Deals and Resources (No Self-Promotion or Affiliate Links)	0	12-01-2010 05:35 AM
Recipes from newspaper and magazines - Spanish \| Uruguay	zeener	Recipes	5	11-24-2010 04:18 PM
Where my recipes are kept?	bthoven	Calibre	6	02-26-2010 12:20 AM
NY Times Recipes	geneaber	Calibre	0	11-08-2009 10:16 PM