Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-06-2011, 04:49 PM   #1
luiscc
Junior Member
luiscc began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: kindle
recipe request: Babelia

Hi, could anyone create a recipe for Babelia?

http://www.elpais.com/suple/babelia/

I tried to change the recipe to El país semanal, but i'm a newbie in python and recipes, the result is far from good, although i get all articles.

Can anyone help?

Kind regards
luiscc is offline   Reply With Quote
Old 03-07-2011, 05:04 AM   #2
miwie
Connoisseur
miwie began at the beginning.
 
Posts: 76
Karma: 12
Join Date: Nov 2010
Device: Android, PB Pro 602
How is this first shot?

Code:
'''
www.elpais.com/suple/babelia/
'''

from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class ElPaisSemanal(BasicNewsRecipe):
    title                 = 'El Pais Babelia'
    description           = 'Suplemento semanal de El Pais'
    publisher             = 'EL PAIS S.L.'
    category              = 'news, politics, Spain'
    no_stylesheets        = True
    encoding              = 'cp1252'
    use_embedded_content  = False
    language              = 'es'
    publication_type      = 'magazine'    

    # El Cigala
    # cover_url = 'http://pixhost.info/avaxhome/e8/27/001527e8_medium.jpeg'

    masthead_url          = 'http://www.elpais.com/im/tit_logo_int.gif'
    index                 = 'http://www.elpais.com/suple/babelia/'

    extra_css             = ' p{text-align: left} body{ text-align: left; font-family: Georgia,"Times New Roman",Times,serif } h2{font-family: Arial,Helvetica,sans-serif} img{margin-bottom: 0.4em} '

    conversion_options = {
                          'comment'      : description
                        , 'tags'         : category
                        , 'publisher'    : publisher
                        , 'language'     : language
                        }

    remove_attributes=['width','height']
    remove_tags=[dict(name='div', attrs={'id':'votosC'}),
	dict(name='div', attrs={'class':'votos'}),
	dict(name='div', attrs={'class':'rec'}),
	dict(name='div', attrs={'class':'rec rec-list'}),
	dict(name='div', attrs={'class':'rec rec-twitter'}),
	dict(name='div', attrs={'class':'rec rec-fbook'})
	]

    remove_tags_before = dict(name='div', attrs={'class':'estructura_2col'})
    remove_tags_after  = [dict(name='div', attrs={'id':'utilidades'}),
	dict(name='div', attrs={'id':'votosD'}),
	dict(name='div', attrs={'id':'mod_util'})
	]

    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.index)
        for item in soup.findAll('a',attrs={'class':['g19i003','g17r003','g17i003']}):
            description = ''
            title_prefix = ''
            feed_link = item
            if item.has_key('href'):
                url   = 'http://www.elpais.com' + item['href'].rpartition('/')[0]
                title = title_prefix + self.tag_to_string(feed_link)
                date  = strftime(self.timefmt)
                articles.append({
                                  'title'      :title
                                 ,'date'       :date
                                 ,'url'        :url
                                 ,'description':description
                                })
        return [(soup.head.title.string, articles)]

    def print_version(self, url):
	pr_url = url + '?print=1'
        return pr_url
miwie is offline   Reply With Quote
Old 03-07-2011, 09:16 AM   #3
luiscc
Junior Member
luiscc began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: kindle

Thank you!!
luiscc is offline   Reply With Quote
Old 03-07-2011, 09:47 AM   #4
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 51
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1
Quote:
Originally Posted by luiscc View Post
Hi, could anyone create a recipe for Babelia?

http://www.elpais.com/suple/babelia/

I tried to change the recipe to El país semanal, but i'm a newbie in python and recipes, the result is far from good, although i get all articles.

Can anyone help?

Kind regards
Here is a recipe. I have also posted this recipe with explanatory comments you may find useful in reply to another post just after yours, http://www.mobileread.com/forums/sho...d.php?t=124538 "How to convert newspaper which do not have RSS feed?"

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class ElPaisBabelia(BasicNewsRecipe):

    title      = 'El Pais Babelia'
    __author__ = 'oneillpt'
    description = 'El Pais Babelia'
    INDEX = 'http://www.elpais.com/suple/babelia/'
    language = 'es'

    remove_tags_before = dict(name='div', attrs={'class':'estructura_2col'})
    keep_tags = [dict(name='div', attrs={'class':'estructura_2col'})]
    remove_tags = [dict(name='div', attrs={'class':'votos estirar'}),
        dict(name='div', attrs={'id':'utilidades'}),
        dict(name='div', attrs={'class':'info_relacionada'}),
        dict(name='div', attrs={'class':'mod_apoyo'}),
        dict(name='div', attrs={'class':'contorno_f'}),
        dict(name='div', attrs={'class':'pestanias'}),
        dict(name='div', attrs={'class':'otros_webs'}),
        dict(name='div', attrs={'id':'pie'})
        ]
    #no_stylesheets = True
    remove_javascript     = True

    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
        cover = None
        feeds = []
        for section in soup.findAll('div', attrs={'class':'contenedor_nuevo'}):
            section_title = self.tag_to_string(section.find('h1'))
            articles = []
            for post in section.findAll('a', href=True):
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.elpais.es'+url
                  title = self.tag_to_string(post)
                  if str(post).find('class=') > 0:
                    klass = post['class']
                    if klass != "":
                      self.log()
                      self.log('--> post:  ', post)
                      self.log('--> url:   ', url)
                      self.log('--> title: ', title)
                      self.log('--> class: ', klass)
                      articles.append({'title':title, 'url':url})
            if articles:
                feeds.append((section_title, articles))
        return feeds
oneillpt is offline   Reply With Quote
Old 03-07-2011, 07:35 PM   #5
luiscc
Junior Member
luiscc began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: kindle
I saw that reply in the other post, it's indeed very helpful. Very nice comments.
Thank you!!
luiscc is offline   Reply With Quote
Reply

Tags
babelia, el pais, request

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Request for Recipe ddavtian Calibre 2 11-24-2008 02:43 AM
Yet another Recipe request.. sherman Calibre 4 11-21-2008 04:42 AM
Request for Recipe girlperson1 Calibre 2 11-14-2008 10:43 PM
Request for Recipe girlperson1 Calibre 2 11-14-2008 07:59 AM
Request for recipe girlperson1 Calibre 2 11-13-2008 10:03 PM


All times are GMT -4. The time now is 04:53 PM.


MobileRead.com is a privately owned, operated and funded community.