Quote:
Originally Posted by Starson17
If you post your recipe it would be easier to see what the problem is.
|
All right, here it is:
Spoiler:
Code:
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
'''
pagina12.com.ar
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString
class Pagina12(BasicNewsRecipe):
title = 'Pagina/12 - Edicion Impresa'
__author__ = 'Pablo Marfil'
description = 'Diario argentino'
INDEX = 'http://www.pagina12.com.ar/diario/secciones/index.html'
language = 'es'
encoding = 'cp1252'
remove_tags_before = dict(id='fecha')
remove_tags_after = dict(id='fin')
remove_tags = [dict(id=['fecha', 'fin', 'pageControls','logo','logo_suple','fecha_suple','volver'])]
masthead_url = 'http://www.pagina12.com.ar/commons/imgs/logo-home.gif'
no_stylesheets = True
preprocess_regexps= [(re.compile(r'<!DOCTYPE[^>]+>', re.I), lambda m:'')]
def parse_index(self):
feeds = []
comic = []
soup = self.index_to_soup('http://www.pagina12.com.ar/diario/ultimas/index.html')
for image in soup.findAll('img',alt=True):
if image['alt'].startswith('Daniel Paz'):
comic.append({'title':'Rudy y Daniel Paz', 'url':image['src'], 'description':'',
'date':''})
print image['src']
if comic:
print 'TIRA HALLADA:',comic
feeds.append(('Humor', comic))
return feeds
Quote:
Basically, you want a link to an html page with an img tag on it that holds your strip. If the site doesn't have a page like that (it should, otherwise how do you see it) you can build it yourself in the recipe.
|
The site has a page, of course, but it contains A LOT of "other things" that I'm not interested in (I mean, for my particular purpose). In fact, is the page that I'm using as the index for parsing the newspaper contents.
So, it seems that I should "build it myself" in the recipe...