View Single Post
Old 09-21-2011, 08:42 PM   #5
macpablus
Enthusiast
macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.
 
Posts: 25
Karma: 1896
Join Date: Aug 2011
Device: Kindle 3
Quote:
Originally Posted by Starson17 View Post
If you post your recipe it would be easier to see what the problem is.
All right, here it is:

Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
'''
pagina12.com.ar
'''
import re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class Pagina12(BasicNewsRecipe):

    title      = 'Pagina/12 - Edicion Impresa'
    __author__ = 'Pablo Marfil'
    description = 'Diario argentino'
    INDEX = 'http://www.pagina12.com.ar/diario/secciones/index.html'
    language = 'es'
    encoding              = 'cp1252'
    remove_tags_before = dict(id='fecha')	
    remove_tags_after  = dict(id='fin')
    remove_tags        = [dict(id=['fecha', 'fin', 'pageControls','logo','logo_suple','fecha_suple','volver'])]
    masthead_url          = 'http://www.pagina12.com.ar/commons/imgs/logo-home.gif'	
    no_stylesheets = True

    preprocess_regexps= [(re.compile(r'<!DOCTYPE[^>]+>', re.I), lambda m:'')]  

  
		
		
    def parse_index(self):
    

        feeds = []
        comic = []
        soup = self.index_to_soup('http://www.pagina12.com.ar/diario/ultimas/index.html')
        for image in soup.findAll('img',alt=True):
            if image['alt'].startswith('Daniel Paz'):
                comic.append({'title':'Rudy y Daniel Paz', 'url':image['src'], 'description':'',
                    'date':''})
            print image['src']    
        if comic:
            print 'TIRA HALLADA:',comic
            feeds.append(('Humor', comic))				
        return feeds



Quote:
Basically, you want a link to an html page with an img tag on it that holds your strip. If the site doesn't have a page like that (it should, otherwise how do you see it) you can build it yourself in the recipe.
The site has a page, of course, but it contains A LOT of "other things" that I'm not interested in (I mean, for my particular purpose). In fact, is the page that I'm using as the index for parsing the newspaper contents.

So, it seems that I should "build it myself" in the recipe...
macpablus is offline   Reply With Quote