Renewed interest in a recipe for Mexican newspaper La Jornada. My recipe below has two problems:
1. The Photos go to the bottom of each article, why?
2. I would like to include the front cover but I don't know how to. The source provides a front cover PDF, here's my bash script to download it:
Code:
Y=`date +%Y` # Year 4 digit
m=`date +%m` # Month 2 digit
d=`date +%d` # day 2 digit
wget -q "http://www.jornada.unam.mx/$Y/$m/$d/portda.pdf" -O - | convert pdf:- png:-
Here's my basic recipe:
Code:
__license__ = 'GPL v3'
__copyright__ = '2009, Pablo Funes <pablo at imprentaluz.com>'
'''
La Jornada
'''
# TODO: Pictures should go to the top, not the bottom of each article.
# TODO: Front cover?
class AdvancedUserRecipe1262065387(BasicNewsRecipe):
title = u'La Jornada'
oldest_article = 7
max_articles_per_feed = 100
feeds = [
('opinion','http://www.jornada.unam.mx/rss/opinion.xml'),
('politica','http://www.jornada.unam.mx/rss/politica.xml'),
('economia','http://www.jornada.unam.mx/rss/economia.xml'),
('mundo','http://www.jornada.unam.mx/rss/mundo.xml'),
('estados','http://www.jornada.unam.mx/rss/estados.xml'),
('capital','http://www.jornada.unam.mx/rss/capital.xml'),
('sociedad','http://www.jornada.unam.mx/rss/sociedad.xml'),
('ciencias','http://www.jornada.unam.mx/rss/ciencias.xml'),
('cultura','http://www.jornada.unam.mx/rss/cultura.xml'),
('gastronomia','http://www.jornada.unam.mx/rss/gastronomia.xml')
,
('espectaculos','http://www.jornada.unam.mx/rss/espectaculos.xml
'),
('deportes','http://www.jornada.unam.mx/rss/deportes.xml'),
('cartones','http://www.jornada.unam.mx/rss/cartones.xml'),
]
keep_only_tags = [
dict(name='div', attrs={'class':["sumarios","cabeza","te
xt","foto"]}),
]