View Single Post
Old 10-31-2010, 01:05 AM   #1
arvoredo
Junior Member
arvoredo began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
Lightbulb ClicRBS, Zero Hora, O Pioneiro, portuguese BR recipe

This is my very first second python recipe for the Kindle using Calibre and I'm happy to share it . It's been based on South-brazilian online papers from ClicRBS RSS feeds, like Zero Hora, O Pioneiro and other portuguese BR newsfeeds.

It still needs some polishing out, so please post what you find out. If you have a better recipe, please feel free to contribute.

Cheers.

First code is now obsolete and is here just for the sake of:
Spoiler:
Code:

class ClicRBS(BasicNewsRecipe):
    title          = u'ClicRBS'
    oldest_article = 3
    max_articles_per_feed = 9
    cover_url             = 'http://www.publicidade.clicrbs.com.br/clicrbs/imgs/logo_clic.gif'

    remove_tags = [
                       dict(name='div', attrs={'class':['clic-barra-inner', 'botao-versao-mobile ']})
                        ]

    remove_tags_before = dict(name='div ', attrs={'class':'descricao'})
    remove_tags_before = dict(name='div', attrs={'id':'glb-corpo'})
    remove_tags_before = dict(name='div', attrs={'class':'descricao'})
    remove_tags_before = dict(name='div', attrs={'class':'coluna'})
    remove_tags_after = dict(name='div', attrs={'class':'extra'})
    remove_tags_after = dict(name='div', attrs={'id':'links-patrocinados'})
    remove_tags_after = dict(name='h4', attrs={'class':'tipo-c comente'})
    remove_tags_after = dict(name='ul', attrs={'class':'lista'})

    feeds = [
               (u'zerohora.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?uf=1&local=1&channel=13')
             , (u'diariocatarinense.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?uf=2&local=18&channel=67')
             , (u'Concursos e Emprego', u'http://g1.globo.com/Rss2/0,,AS0-9654,00.xml')
             , (u'Pioneiro.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?channel=87&uf=1&local=1')
             , (u'Economia, zerohora.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=801&uf=1&local=1&channel=13')
             , (u'Esportes, zerohora.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=802&uf=1&local=1&channel=13')
             , (u'Economia, Pioneiro.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=1180&channel=87&uf=1&local=1')
             , (u'Política, Pioneiro.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=1185&channel=87&uf=1&local=1')
             , (u'Mundo, Pioneiro.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=1184&channel=87&uf=1&local=1')
             , (u'Catarinense, Esportes, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=&theme=371&uf=2&channel=2')
             , (u'Geral, Pioneiro.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=1183&channel=87&uf=1&local=1')
             , (u'Estilo de Vida, zerohora.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=805&uf=1&local=1&channel=13')
             , (u'Corrida, Corrida, Esportes, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=1313&theme=15704&uf=1&channel=2')
             , (u'Jornal de Santa Catarina, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?espid=159&uf=2&local=18')
             , (u'Grêmio, Futebol, Esportes, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=11&theme=65&uf=1&channel=2')
             , (u'Velocidade, Esportes, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?sect_id=1314&theme=2655&uf=1&channel=2')
            ]

    extra_css = '''
                    cite{color:#007BB5; font-size:xx-small; font-style:italic;}
                    body{font-family:Arial,Helvetica,sans-serif;font-size:x-small;}
                    h3{font-size:large; color:#082963; font-weight:bold;}
                    #ident{color:#0179B4; font-size:xx-small;}
                    p{color:#000000;font-weight:normal;}                    
                    .commentario p{color:#007BB5; font-style:italic;}
                '''



Second code, still valid but very unpolished:
Code:
Spoiler:
class ZH(BasicNewsRecipe): title = u'ZH em testes' oldest_article = 3 max_articles_per_feed = 9 cover_url = 'http://www.publicidade.clicrbs.com.br/clicrbs/imgs/logo_clic.gif' remove_tags = [ dict(name='div', attrs={'class':['clic-barra-inner', 'botao-versao-mobile ']}) ] remove_tags_before = dict(name='div ', attrs={'class':'descricao'}) remove_tags_before = dict(name='div', attrs={'id':'glb-corpo'}) remove_tags_before = dict(name='div', attrs={'class':'descricao'}) remove_tags_before = dict(name='div', attrs={'class':'coluna'}) remove_tags_after = dict(name='div', attrs={'class':'extra'}) remove_tags_after = dict(name='div', attrs={'id':'links-patrocinados'}) remove_tags_after = dict(name='h4', attrs={'class':'tipo-c comente'}) remove_tags_after = dict(name='ul', attrs={'class':'lista'}) feeds = [ (u'zerohora.com, clicRBS', u'http://br.zerohora.feedsportal.com/c/33341/f/566001/index.rss') , (u'Economia, zerohora.com', u'http://br.zerohora.feedsportal.com/c/33341/f/566002/index.rss') , (u'Segundo Caderno, zerohora.com', u'http://br.zerohora.feedsportal.com/c/33341/f/566004/index.rss') , (u'Pioneiro.com, clicRBS', u'http://www.clicrbs.com.br/jsp/rssfeed.jspx?channel=87&uf=1&local=1') , (u'Paulo SantAna', u'http://br.zerohora.feedsportal.com/c/33341/f/566007/index.rss') , (u'Wianey Carle', u'http://br.zerohora.feedsportal.com/c/33341/f/566009/index.rss') ] extra_css = ''' cite{color:#007BB5; font-size:xx-small; font-style:italic;} body{font-family:Arial,Helvetica,sans-serif;font-size:x-small;} h3{font-size:large; color:#082963; font-weight:bold;} #ident{color:#0179B4; font-size:xx-small;} p{color:#000000;font-weight:normal;} .commentario p{color:#007BB5; font-style:italic;} '''

Last edited by arvoredo; 06-12-2011 at 06:09 PM. Reason: Older code was obsolete
arvoredo is offline   Reply With Quote