recipe request: Babelia

luiscc · 03-06-2011, 04:49 PM

Hi, could anyone create a recipe for Babelia?

http://www.elpais.com/suple/babelia/

I tried to change the recipe to El país semanal, but i'm a newbie in python and recipes, the result is far from good, although i get all articles.

Can anyone help?

Kind regards

miwie · 03-07-2011, 05:04 AM

How is this first shot?

Code:

'''
www.elpais.com/suple/babelia/
'''

from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class ElPaisSemanal(BasicNewsRecipe):
    title                 = 'El Pais Babelia'
    description           = 'Suplemento semanal de El Pais'
    publisher             = 'EL PAIS S.L.'
    category              = 'news, politics, Spain'
    no_stylesheets        = True
    encoding              = 'cp1252'
    use_embedded_content  = False
    language              = 'es'
    publication_type      = 'magazine'    

    # El Cigala
    # cover_url = 'http://pixhost.info/avaxhome/e8/27/001527e8_medium.jpeg'

    masthead_url          = 'http://www.elpais.com/im/tit_logo_int.gif'
    index                 = 'http://www.elpais.com/suple/babelia/'

    extra_css             = ' p{text-align: left} body{ text-align: left; font-family: Georgia,"Times New Roman",Times,serif } h2{font-family: Arial,Helvetica,sans-serif} img{margin-bottom: 0.4em} '

    conversion_options = {
                          'comment'      : description
                        , 'tags'         : category
                        , 'publisher'    : publisher
                        , 'language'     : language
                        }

    remove_attributes=['width','height']
    remove_tags=[dict(name='div', attrs={'id':'votosC'}),
	dict(name='div', attrs={'class':'votos'}),
	dict(name='div', attrs={'class':'rec'}),
	dict(name='div', attrs={'class':'rec rec-list'}),
	dict(name='div', attrs={'class':'rec rec-twitter'}),
	dict(name='div', attrs={'class':'rec rec-fbook'})
	]

    remove_tags_before = dict(name='div', attrs={'class':'estructura_2col'})
    remove_tags_after  = [dict(name='div', attrs={'id':'utilidades'}),
	dict(name='div', attrs={'id':'votosD'}),
	dict(name='div', attrs={'id':'mod_util'})
	]

    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.index)
        for item in soup.findAll('a',attrs={'class':['g19i003','g17r003','g17i003']}):
            description = ''
            title_prefix = ''
            feed_link = item
            if item.has_key('href'):
                url   = 'http://www.elpais.com' + item['href'].rpartition('/')[0]
                title = title_prefix + self.tag_to_string(feed_link)
                date  = strftime(self.timefmt)
                articles.append({
                                  'title'      :title
                                 ,'date'       :date
                                 ,'url'        :url
                                 ,'description':description
                                })
        return [(soup.head.title.string, articles)]

    def print_version(self, url):
	pr_url = url + '?print=1'
        return pr_url

luiscc · 03-07-2011, 09:16 AM

Thank you!!

oneillpt · 03-07-2011, 09:47 AM

Quote:

Originally Posted by luiscc

Hi, could anyone create a recipe for Babelia?

http://www.elpais.com/suple/babelia/

I tried to change the recipe to El país semanal, but i'm a newbie in python and recipes, the result is far from good, although i get all articles.

Can anyone help?

Kind regards

Here is a recipe. I have also posted this recipe with explanatory comments you may find useful in reply to another post just after yours, https://www.mobileread.com/forums/sho...d.php?t=124538 "How to convert newspaper which do not have RSS feed?"

Code:

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class ElPaisBabelia(BasicNewsRecipe):

    title      = 'El Pais Babelia'
    __author__ = 'oneillpt'
    description = 'El Pais Babelia'
    INDEX = 'http://www.elpais.com/suple/babelia/'
    language = 'es'

    remove_tags_before = dict(name='div', attrs={'class':'estructura_2col'})
    keep_tags = [dict(name='div', attrs={'class':'estructura_2col'})]
    remove_tags = [dict(name='div', attrs={'class':'votos estirar'}),
        dict(name='div', attrs={'id':'utilidades'}),
        dict(name='div', attrs={'class':'info_relacionada'}),
        dict(name='div', attrs={'class':'mod_apoyo'}),
        dict(name='div', attrs={'class':'contorno_f'}),
        dict(name='div', attrs={'class':'pestanias'}),
        dict(name='div', attrs={'class':'otros_webs'}),
        dict(name='div', attrs={'id':'pie'})
        ]
    #no_stylesheets = True
    remove_javascript     = True

    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
        cover = None
        feeds = []
        for section in soup.findAll('div', attrs={'class':'contenedor_nuevo'}):
            section_title = self.tag_to_string(section.find('h1'))
            articles = []
            for post in section.findAll('a', href=True):
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.elpais.es'+url
                  title = self.tag_to_string(post)
                  if str(post).find('class=') > 0:
                    klass = post['class']
                    if klass != "":
                      self.log()
                      self.log('--> post:  ', post)
                      self.log('--> url:   ', url)
                      self.log('--> title: ', title)
                      self.log('--> class: ', klass)
                      articles.append({'title':title, 'url':url})
            if articles:
                feeds.append((section_title, articles))
        return feeds

luiscc · 03-07-2011, 07:35 PM

I saw that reply in the other post, it's indeed very helpful. Very nice comments.
Thank you!!

03-06-2011, 04:49 PM	#1
luiscc Junior Member Posts: 9 Karma: 10 Join Date: Feb 2011 Device: kindle	recipe request: Babelia Hi, could anyone create a recipe for Babelia? http://www.elpais.com/suple/babelia/ I tried to change the recipe to El país semanal, but i'm a newbie in python and recipes, the result is far from good, although i get all articles. Can anyone help? Kind regards

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Request for Recipe	ddavtian	Calibre	2	11-24-2008 02:43 AM
Yet another Recipe request..	sherman	Calibre	4	11-21-2008 04:42 AM
Request for Recipe	girlperson1	Calibre	2	11-14-2008 10:43 PM
Request for Recipe	girlperson1	Calibre	2	11-14-2008 07:59 AM
Request for recipe	girlperson1	Calibre	2	11-13-2008 10:03 PM

03-07-2011, 09:16 AM	#3
luiscc Junior Member Posts: 9 Karma: 10 Join Date: Feb 2011 Device: kindle	Thank you!!

03-07-2011, 07:35 PM	#5
luiscc Junior Member Posts: 9 Karma: 10 Join Date: Feb 2011 Device: kindle	I saw that reply in the other post, it's indeed very helpful. Very nice comments. Thank you!!

Advert

Advert