Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes


Thread Tools Search this Thread
Old 11-19-2011, 04:11 PM   #1
desUBIKado began at the beginning.
Posts: 20
Karma: 12
Join Date: Feb 2009
Location: Zaragoza, Spain
Device: prs-505, iliad
New version for "Expansion" recipe (now work correctly) [Spanish]

Hi there:

Recipe for "" isn't mine, but it didn't work and I fix it. The changes are:

- Changed "feedsportal" rss feeds to Expansion equivalents
- To clear advertising to recover the news, and always recovers all website articles, I send the url with variable "t" reporting "linux" or epoch time for the website believes that just showed above the advertising
- Code to present the image of the videos embedded
- Get the cover
- Delete duplicate articles if they are in more than one feed.
- Update "remove_tags"
- Add "extra_css"

#!/usr/bin/env  python
__license__     = 'GPL v3'
__copyright__   = '5, January 2011 Gerardo Diez<> & desUBIKado'
__author__      = 'desUBIKado, based on an earlier version by Gerardo Diez'
__version__     = 'v1.01'
__date__        = '13, November 2011'


import time
import re
from import BasicNewsRecipe

class expansion_spanish(BasicNewsRecipe):    
    __author__      ='Gerardo Diez & desUBIKado'    
    description     ='Financial news from Spain'
    title           =u'Expansion'
    publisher       =u'Unidad Editorial Internet, S.L.'
    category        ='news, finances, Spain'
    oldest_article  = 2    
    simultaneous_downloads = 10
    max_articles_per_feed   =100        
    timefmt         = '[%a, %d %b, %Y]'
    encoding        ='iso-8859-15'
    language        ='es'    
    use_embedded_content  = False
    remove_javascript     = True
    no_stylesheets        = True
    remove_empty_feeds    = True

    keep_only_tags      =dict(name='div', attrs={'class':['noticia primer_elemento']})

    remove_tags         =[
                dict(name='div', attrs={'class':['compartir', 'metadata_desarrollo_noticia', 'relacionadas', 'mas_info','publicidad publicidad_textlink', 'ampliarfoto','tit_relacionadas','interact','paginacion estirar','sumario derecha']}),
                dict(name='ul', attrs={'class':['bolos_desarrollo_noticia','not_logged']}),
                dict(name='span', attrs={'class':['comentarios']}),
                dict(name='p', attrs={'class':['cintillo_comentarios', 'cintillo_comentarios formulario']}),
                dict(name='div', attrs={'id':['comentarios_lectores_listado','comentar']})
    feeds               =[
                (u'Portada', u''),
                (u'Portada: Bolsas', u''),
                (u'Divisas', u''),
                (u'Euribor', u''),
                (u'Materias Primas', u''),
                (u'Renta Fija', u''),
                (u'Portada: Mi Dinero', u''),
                (u'Hipotecas', u''),
                (u'Cr\xe9ditos', u''),
                (u'Pensiones', u''),
                (u'Fondos de Inversi\xf3n', u''),
                (u'Motor', u''),
                (u'Portada: Empresas', u''),
                (u'Banca', u''),
                (u'TMT', u''),
                (u'Energ\xeda', u''),
                (u'Inmobiliario y Construcci\xf3n', u''),
                (u'Transporte y Turismo', u''),
                (u'Automoci\xf3n e Industria', u''),
                (u'Distribuci\xf3n', u''),
                (u'Deporte y Negocio', u''),
                (u'Mi Negocio', u''),
                (u'Interiores', u''),
                (u'Digitech', u''),
                (u'Portada: Econom\xeda y Pol\xedtica', u''),
                (u'Pol\xedtica', u''),
                (u'Portada: Sociedad', u''),
                (u'Portada: Opini\xf3n', u''),
                (u'Llaves y editoriales', u''),
                (u'Tribunas', u''),
                (u'Portada: Jur\xeddico', u''),
                (u'Entrevistas', u''),
                (u'Opini\xf3n', u''),
                (u'Sentencias', u''),
                (u'Mujer', u''),
                (u'Catalu\xf1a', u''),
                (u'Funci\xf3n p\xfablica', u'')
    # Obtener la imagen de portada

    def get_cover_url(self):
       cover = None
       st = time.localtime()
       year = str(st.tm_year)
       month = "%.2d" % st.tm_mon
       day = "%.2d" % st.tm_mday
       cover=''+ year + '/' +  month + '/' + day +'/es/expansion.750.jpg'
       br = BasicNewsRecipe.get_browser()
           self.log("\nPortada no disponible")
           cover =''
       return cover

    # Para que no salte la publicidad al recuperar la noticia, y que siempre se recupere
    # la página web, mando la variable "t" con la hora "linux" o "epoch" actual 
    # haciendole creer al sitio web que justo se acaba de ver la publicidad   
    def print_version(self, url):        
           st = time.time()
           segundos = str(int(st))
           parametros = '.html?t=' + segundos
           return url.replace('.html', parametros)

    _processed_links = []

    def get_article_url(self, article):

       # Para obtener la url original del artículo a partir de la de "feedsportal"

       link = article.get('link', None)
       if link is None:
           return article
       if link.split('/')[-1]=="story01.htm":
           a=['0B','0C','0D','0E','0F','0G','0N'  ,'0L0S','0A']
           b=['.' ,'/' ,'?' ,'-' ,'=' ,'&' ,'.com','www.','0']
           for i in range(0,len(a)):

       # Eliminar artículos duplicados en otros feeds

       if not (link in self._processed_links):
            link = None       
       return link

    # Un poco de css para mejorar la presentación de las noticias

    extra_css = '''                    
                    .entradilla {font-family:Arial,Helvetica,sans-serif; font-weight:bold; font-style:italic; font-size:16px;}    
                    .fecha_publicacion,.autor {font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:14px;}                              

    # Para presentar la imagen de los videos incrustados                           

    preprocess_regexps = [     
                           (re.compile(r'var imagen', re.DOTALL|re.IGNORECASE), lambda match: '--></script><img src'),
                           (re.compile(r'.jpg";', re.DOTALL|re.IGNORECASE), lambda match: '.jpg">'),
                           (re.compile(r'var id_reproductor', re.DOTALL|re.IGNORECASE), lambda match: '<script language="Javascript" type="text/javascript"><!--'),

Last edited by kovidgoyal; 11-19-2011 at 11:05 PM.
desUBIKado is offline   Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New VErsion for "IL GIORNALE" recipe (now work correctly) gambarini Recipes 0 11-09-2011 01:05 PM
Guide for you can see correctly the pdfs "comics, images, documents" in ereaders sony computersexperto LRF 2 11-29-2009 07:13 AM
"Do not buy this book till the Kindle version is priced correctly!" --WTF? taglines News 3 11-05-2009 12:32 PM
"Sort By Author" not sorting correctly within author's collection Sonist Amazon Kindle 1 08-05-2009 08:52 PM
"Menu" and "Mark" keys does not work murad Sony Reader 4 07-11-2009 01:35 PM

All times are GMT -4. The time now is 06:50 AM. is a privately owned, operated and funded community.