Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-26-2010, 04:21 PM   #1
desUBIKado
Member
desUBIKado began at the beginning.
 
Posts: 19
Karma: 12
Join Date: Feb 2009
Location: Zaragoza, Spain
Device: prs-505, iliad
3 recipes in spanish

Hi there,

This is my first contribution for the Calibre communitiy, 3 recipes in spanish. Two are newspapers and the other is a entertainment site.

All of them bring news from Aragon, a autonomous community in Spain.

Calibre already have a recipe for the newspaper "Heraldo de Aragon", but not working properly. Mine does work well

Regards,

desUBIKado


1. heraldo.es

Spoiler:

#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '04 December 2010, desUBIKado'
__author__ = 'desUBIKado'
__description__ = 'Daily newspaper from Aragon'
__version__ = 'v0.03'
__date__ = '11, December 2010'
'''
http://www.heraldo.es/
'''

import time
from calibre.web.feeds.news import BasicNewsRecipe

class heraldo(BasicNewsRecipe):
author = 'desUBIKado'
description = 'Daily newspaper from Aragon'
title = u'Heraldo de Aragon'
publisher = 'OJD Nielsen'
category = 'News, politics, culture, economy, general interest'
language = 'es'
timefmt = '[%a, %d %b, %Y]'
oldest_article = 1
max_articles_per_feed = 100
use_embedded_content = False
remove_javascript = True
no_stylesheets = True
recursion = 10

feeds = [
(u'Portadas', u'http://www.heraldo.es/index.php/mod.portadas/mem.rss')
]



keep_only_tags = [dict(name='div', attrs={'id':['dts','com']})]

remove_tags = [dict(name='a', attrs={'class':['com flo-r','enl-if','enl-df']}),
dict(name='div', attrs={'class':['brb-b-s con marg-btt','cnt-rel con']}),
dict(name='form', attrs={'class':'form'})]

remove_tags_before = dict(name='div' , attrs={'id':'dts'})
remove_tags_after = dict(name='div' , attrs={'id':'com'})

def get_cover_url(self):
cover = None
st = time.localtime()
year = str(st.tm_year)
month = "%.2d" % st.tm_mon
day = "%.2d" % st.tm_mday
#http://oldorigin-www.heraldo.es/2010...ada_aragon.pdf
cover='http://oldorigin-www.heraldo.es/'+ year + month + day +'/primeras/portada_aragon.pdf'
br = BasicNewsRecipe.get_browser()
try:
br.open(cover)
except:
self.log("\nPortada no disponible")
cover ='http://www.heraldo.es/MODULOS/global/publico/interfaces/img/logo-Heraldo.png'
return cover



extra_css = '''
h2{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:xx-large;}
'''


2. elperiodicodearagon.com

Spoiler:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

__license__ = 'GPL v3'
__copyright__ = '04 December 2010, desUBIKado'
__author__ = 'desUBIKado'
__description__ = 'Daily newspaper from Aragon'
__version__ = 'v0.05'
__date__ = '07, December 2010'
'''
elperiodicodearagon.com
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe


class elperiodicodearagon(BasicNewsRecipe):
title = u'El Periodico de Aragon'
author = u'desUBIKado'
description = u'Noticias desde Aragon'
publisher = u'elperiodicodearagon.com'
category = u'news, politics, Spain, Aragon'
oldest_article = 2
delay = 0
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
language = 'es'
encoding = 'utf8'
remove_empty_feeds = True
remove_javascript = True


conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}

feeds = [(u'Arag\xf3n', u'http://elperiodicodearagon.com/RSS/2.xml'),
(u'Internacional', u'http://elperiodicodearagon.com/RSS/4.xml'),
(u'Espa\xf1a', u'http://elperiodicodearagon.com/RSS/3.xml'),
(u'Econom\xeda', u'http://elperiodicodearagon.com/RSS/5.xml'),
(u'Deportes', u'http://elperiodicodearagon.com/RSS/7.xml'),
(u'Real Zaragoza', u'http://elperiodicodearagon.com/RSS/10.xml'),
(u'Opini\xf3n', u'http://elperiodicodearagon.com/RSS/103.xml'),
(u'Escenarios', u'http://elperiodicodearagon.com/RSS/105.xml'),
(u'Sociedad', u'http://elperiodicodearagon.com/RSS/104.xml'),
(u'Gente', u'http://elperiodicodearagon.com/RSS/330.xml')]


extra_css = '''
h3{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:xx-large;}
h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
dd{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
'''

remove_attributes = ['height','width']

keep_only_tags = [dict(name='div', attrs={'id':'contenidos'})]


# Quitar toda la morralla

remove_tags = [dict(name='ul', attrs={'class':'herramientasDeNoticia'}),
dict(name='span', attrs={'class':'MasInformacion '}),
dict(name='span', attrs={'class':'MasInformacion'}),
dict(name='div', attrs={'class':'Middle'}),
dict(name='div', attrs={'class':'MenuCabeceraRZaragoza'}),
dict(name='div', attrs={'id':'MenuCabeceraRZaragoza'}),
dict(name='div', attrs={'class':'MenuEquipo'}),
dict(name='div', attrs={'class':'TemasRelacionados'}),
dict(name='div', attrs={'class':'GaleriaEnNoticia'}),
dict(name='div', attrs={'class':'Recorte'}),
dict(name='div', attrs={'id':'NoticiasenRecursos'}),
dict(name='div', attrs={'id':'NoticiaEnPapel'}),
dict(name='p', attrs={'class':'RecorteEnNoticias'}),
dict(name='div', attrs={'id':'Comparte'}),
dict(name='div', attrs={'id':'CajaComparte'}),
dict(name='a', attrs={'class':'EscribirComentario'}),
dict(name='a', attrs={'class':'AvisoComentario'}),
dict(name='div', attrs={'class':'CajaAvisoComentario'}),
dict(name='div', attrs={'class':'navegaNoticias'}),
dict(name='div', attrs={'id':'PaginadorDiCom'}),
dict(name='div', attrs={'id':'CajaAccesoCuentaUsuario'}),
dict(name='div', attrs={'id':'CintilloComentario'}),
dict(name='div', attrs={'id':'EscribeComentario'}),
dict(name='div', attrs={'id':'FormularioComentario'}),
dict(name='div', attrs={'id':'FormularioNormas'})]

# Recuperamos la portada de papel (la imagen format=1 tiene mayor resolucion)

def get_cover_url(self):
index = 'http://pdf.elperiodicodearagon.com/'
soup = self.index_to_soup(index)
for image in soup.findAll('img',src=True):
if image['src'].startswith('http://pdf.elperiodicodearagon.com/funciones/portada-preview.php?eid='):
return image['src'].rstrip('format=2') + 'format=1'
return None

# Para quitar espacios entre la noticia y los comentarios (lineas 1 y 2)
# El indice no apuntaba correctamente al empiece de la noticia (linea 3)

preprocess_regexps = [
(re.compile(r'<p>&nbsp;</p>', re.DOTALL|re.IGNORECASE), lambda match: ''),
(re.compile(r'<p> </p>', re.DOTALL|re.IGNORECASE), lambda match: ''),
(re.compile(r'<p id="">', re.DOTALL|re.IGNORECASE), lambda match: '<p>')
]



3. redaragon.com

Spoiler:

#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '11 December 2010, desUBIKado'
__author__ = 'desUBIKado'
__description__ = 'Entertainment guide from Aragon'
__version__ = 'v0.01'
__date__ = '11, December 2010'
'''
http://www.redaragon.es/
'''

from calibre.web.feeds.news import BasicNewsRecipe

class heraldo(BasicNewsRecipe):
author = 'desUBIKado'
description = u'Guia de ocio desde Aragon'
title = u'RedAragon'
publisher = 'Grupo Z'
category = 'Concerts, Movies, Entertainment news'
cover_url = 'http://www.redaragon.com/2008_img/logotipo.gif'
language = 'es'
timefmt = '[%a, %d %b, %Y]'
oldest_article = 15
max_articles_per_feed = 100
encoding = 'iso-8859-1'
use_embedded_content = False
remove_javascript = True
no_stylesheets = True

feeds = [(u'Conciertos', u'http://redaragon.com/rss/agenda.asp?tid=1'),
(u'Exposiciones', u'http://redaragon.com/rss/agenda.asp?tid=5'),
(u'Teatro', u'http://redaragon.com/rss/agenda.asp?tid=10'),
(u'Conferencias', u'http://redaragon.com/rss/agenda.asp?tid=2'),
(u'Ferias', u'http://redaragon.com/rss/agenda.asp?tid=6'),
(u'Filmotecas/Cineclubs', u'http://redaragon.com/rss/agenda.asp?tid=7'),
(u'Presentaciones', u'http://redaragon.com/rss/agenda.asp?tid=9'),
(u'Fiestas', u'http://redaragon.com/rss/agenda.asp?tid=11'),
(u'Infantil', u'http://redaragon.com/rss/agenda.asp?tid=13'),
(u'Otros', u'http://redaragon.com/rss/agenda.asp?tid=8')]

keep_only_tags = [dict(name='div', attrs={'id':'FichaEventoAgenda'})]

remove_tags = [dict(name='div', attrs={'class':['Comparte','CajaAgenda','Caja','Cintillo']})]

remove_tags_before = dict(name='div' , attrs={'id':'FichaEventoAgenda'})

remove_tags_after = dict(name='div' , attrs={'class':'Cintillo'})

desUBIKado is offline   Reply With Quote
Old 12-27-2010, 01:20 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,433
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
thanks, will be in the next calibre release.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
2 recipes derekliang Recipes 1 12-21-2010 12:44 AM
Free (Kindle) Old Havana Cookbook: Cuban Recipes in Spanish and English (Bilingual Co arcadata Deals, Freebies, and Resources (No Self-Promotion) 0 12-01-2010 06:35 AM
Recipes from newspaper and magazines - Spanish | Uruguay zeener Recipes 5 11-24-2010 05:18 PM
Where my recipes are kept? bthoven Calibre 6 02-26-2010 01:20 AM
NY Times Recipes geneaber Calibre 0 11-08-2009 11:16 PM


All times are GMT -4. The time now is 02:36 PM.


MobileRead.com is a privately owned, operated and funded community.