Thread: La Jornada
View Single Post
Old 12-29-2009, 08:17 PM   #1
pablofunes
Junior Member
pablofunes began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2009
Device: kindle2
Question La Jornada

Renewed interest in a recipe for Mexican newspaper La Jornada. My recipe below has two problems:

1. The Photos go to the bottom of each article, why?

2. I would like to include the front cover but I don't know how to. The source provides a front cover PDF, here's my bash script to download it:

Code:
Y=`date +%Y` # Year 4 digit
m=`date +%m` # Month 2 digit
d=`date +%d` # day 2 digit
wget -q "http://www.jornada.unam.mx/$Y/$m/$d/portda.pdf" -O - | convert pdf:- png:-
Here's my basic recipe:

Code:
__license__   = 'GPL v3'
__copyright__ = '2009, Pablo Funes <pablo at imprentaluz.com>'
'''
La Jornada
'''

# TODO: Pictures should go to the top, not the bottom of each article.  
# TODO: Front cover? 

class AdvancedUserRecipe1262065387(BasicNewsRecipe):
    title          = u'La Jornada'
    oldest_article = 7
    max_articles_per_feed = 100

    feeds          = [
                ('opinion','http://www.jornada.unam.mx/rss/opinion.xml'),
                ('politica','http://www.jornada.unam.mx/rss/politica.xml'),
                ('economia','http://www.jornada.unam.mx/rss/economia.xml'),
                ('mundo','http://www.jornada.unam.mx/rss/mundo.xml'),
                ('estados','http://www.jornada.unam.mx/rss/estados.xml'),
                ('capital','http://www.jornada.unam.mx/rss/capital.xml'),
                ('sociedad','http://www.jornada.unam.mx/rss/sociedad.xml'),
                ('ciencias','http://www.jornada.unam.mx/rss/ciencias.xml'),
                ('cultura','http://www.jornada.unam.mx/rss/cultura.xml'),
                ('gastronomia','http://www.jornada.unam.mx/rss/gastronomia.xml')
,
                ('espectaculos','http://www.jornada.unam.mx/rss/espectaculos.xml
'),
                ('deportes','http://www.jornada.unam.mx/rss/deportes.xml'),
                ('cartones','http://www.jornada.unam.mx/rss/cartones.xml'),

                ]


    keep_only_tags = [
                        dict(name='div', attrs={'class':["sumarios","cabeza","te
xt","foto"]}),
                          ]
pablofunes is offline   Reply With Quote