03-06-2011, 04:49 PM | #1 |
Junior Member
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: kindle
|
recipe request: Babelia
Hi, could anyone create a recipe for Babelia?
http://www.elpais.com/suple/babelia/ I tried to change the recipe to El país semanal, but i'm a newbie in python and recipes, the result is far from good, although i get all articles. Can anyone help? Kind regards |
03-07-2011, 05:04 AM | #2 |
Connoisseur
Posts: 76
Karma: 12
Join Date: Nov 2010
Device: Android, PB Pro 602
|
How is this first shot?
Code:
''' www.elpais.com/suple/babelia/ ''' from calibre import strftime from calibre.web.feeds.news import BasicNewsRecipe class ElPaisSemanal(BasicNewsRecipe): title = 'El Pais Babelia' description = 'Suplemento semanal de El Pais' publisher = 'EL PAIS S.L.' category = 'news, politics, Spain' no_stylesheets = True encoding = 'cp1252' use_embedded_content = False language = 'es' publication_type = 'magazine' # El Cigala # cover_url = 'http://pixhost.info/avaxhome/e8/27/001527e8_medium.jpeg' masthead_url = 'http://www.elpais.com/im/tit_logo_int.gif' index = 'http://www.elpais.com/suple/babelia/' extra_css = ' p{text-align: left} body{ text-align: left; font-family: Georgia,"Times New Roman",Times,serif } h2{font-family: Arial,Helvetica,sans-serif} img{margin-bottom: 0.4em} ' conversion_options = { 'comment' : description , 'tags' : category , 'publisher' : publisher , 'language' : language } remove_attributes=['width','height'] remove_tags=[dict(name='div', attrs={'id':'votosC'}), dict(name='div', attrs={'class':'votos'}), dict(name='div', attrs={'class':'rec'}), dict(name='div', attrs={'class':'rec rec-list'}), dict(name='div', attrs={'class':'rec rec-twitter'}), dict(name='div', attrs={'class':'rec rec-fbook'}) ] remove_tags_before = dict(name='div', attrs={'class':'estructura_2col'}) remove_tags_after = [dict(name='div', attrs={'id':'utilidades'}), dict(name='div', attrs={'id':'votosD'}), dict(name='div', attrs={'id':'mod_util'}) ] def parse_index(self): articles = [] soup = self.index_to_soup(self.index) for item in soup.findAll('a',attrs={'class':['g19i003','g17r003','g17i003']}): description = '' title_prefix = '' feed_link = item if item.has_key('href'): url = 'http://www.elpais.com' + item['href'].rpartition('/')[0] title = title_prefix + self.tag_to_string(feed_link) date = strftime(self.timefmt) articles.append({ 'title' :title ,'date' :date ,'url' :url ,'description':description }) return [(soup.head.title.string, articles)] def print_version(self, url): pr_url = url + '?print=1' return pr_url |
Advert | |
|
03-07-2011, 09:16 AM | #3 |
Junior Member
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: kindle
|
Thank you!! |
03-07-2011, 09:47 AM | #4 | |
Connoisseur
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
Quote:
Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import Tag, NavigableString class ElPaisBabelia(BasicNewsRecipe): title = 'El Pais Babelia' __author__ = 'oneillpt' description = 'El Pais Babelia' INDEX = 'http://www.elpais.com/suple/babelia/' language = 'es' remove_tags_before = dict(name='div', attrs={'class':'estructura_2col'}) keep_tags = [dict(name='div', attrs={'class':'estructura_2col'})] remove_tags = [dict(name='div', attrs={'class':'votos estirar'}), dict(name='div', attrs={'id':'utilidades'}), dict(name='div', attrs={'class':'info_relacionada'}), dict(name='div', attrs={'class':'mod_apoyo'}), dict(name='div', attrs={'class':'contorno_f'}), dict(name='div', attrs={'class':'pestanias'}), dict(name='div', attrs={'class':'otros_webs'}), dict(name='div', attrs={'id':'pie'}) ] #no_stylesheets = True remove_javascript = True def parse_index(self): articles = [] soup = self.index_to_soup(self.INDEX) cover = None feeds = [] for section in soup.findAll('div', attrs={'class':'contenedor_nuevo'}): section_title = self.tag_to_string(section.find('h1')) articles = [] for post in section.findAll('a', href=True): url = post['href'] if url.startswith('/'): url = 'http://www.elpais.es'+url title = self.tag_to_string(post) if str(post).find('class=') > 0: klass = post['class'] if klass != "": self.log() self.log('--> post: ', post) self.log('--> url: ', url) self.log('--> title: ', title) self.log('--> class: ', klass) articles.append({'title':title, 'url':url}) if articles: feeds.append((section_title, articles)) return feeds |
|
03-07-2011, 07:35 PM | #5 |
Junior Member
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: kindle
|
I saw that reply in the other post, it's indeed very helpful. Very nice comments.
Thank you!! |
Advert | |
|
Tags |
babelia, el pais, request |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Request for Recipe | ddavtian | Calibre | 2 | 11-24-2008 02:43 AM |
Yet another Recipe request.. | sherman | Calibre | 4 | 11-21-2008 04:42 AM |
Request for Recipe | girlperson1 | Calibre | 2 | 11-14-2008 10:43 PM |
Request for Recipe | girlperson1 | Calibre | 2 | 11-14-2008 07:59 AM |
Request for recipe | girlperson1 | Calibre | 2 | 11-13-2008 10:03 PM |