Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-14-2011, 02:00 PM   #1
desUBIKado
Member
desUBIKado began at the beginning.
 
Posts: 22
Karma: 12
Join Date: Feb 2009
Location: Zaragoza, Spain
Device: prs-505, iliad
Fixed recipes: weblogssl & elperiodicodearagon [ES]

Hi there:

I updated two of my recipes, weblogssl and elperiodicodearagon.

weblogssl changes:

1. updated feed list
2. Changed "feedsportal" rss feeds to weblogsssl equivalents

Code:
#!/usr/bin/env  python
__license__     = 'GPL v3'
__copyright__   = '4 February 2011, desUBIKado'
__author__      = 'desUBIKado'
__version__     = 'v0.07'
__date__        = '13, November 2011'
'''
http://www.weblogssl.com/
'''

import time
import re
from calibre.web.feeds.news import BasicNewsRecipe

class weblogssl(BasicNewsRecipe):
    __author__     = 'desUBIKado'
    description    = u'Weblogs colectivos dedicados a seguir la actualidad sobre tecnologia, entretenimiento, estilos de vida, motor, deportes y economia.'
    title          = u'Weblogs SL (Xataka, Genbeta, VidaExtra, Blog de Cine y otros)'
    publisher      = 'Weblogs SL'
    category       = 'Gadgets, Tech news, Product reviews, mobiles, science, cinema, entertainment, culture, tv, food, recipes, life style, motor, F1, sports, economy'
    language       = 'es'    
    timefmt        = '[%a, %d %b, %Y]'
    oldest_article = 1
    max_articles_per_feed = 100
    encoding       = 'utf-8'
    use_embedded_content  = False
    remove_empty_feeds    = True
    remove_javascript = True
    no_stylesheets = True 

    # Si no se quiere recuperar todos los blogs se puede suprimir la descarga del que se desee poniendo
    # un caracter # por delante, es decir,  # ,(u'Applesfera', u'http://feeds.weblogssl.com/applesfera')
    # haría que no se descargase Applesfera. 

    feeds              = [
                          (u'Xataka', u'http://feeds.weblogssl.com/xataka2')
                          ,(u'Xataka Mexico', u'http://feeds.weblogssl.com/xatakamx')
                          ,(u'Xataka M\xf3vil', u'http://feeds.weblogssl.com/xatakamovil')
                          ,(u'Xataka Android', u'http://feeds.weblogssl.com/xatakandroid')
                          ,(u'Xataka Foto', u'http://feeds.weblogssl.com/xatakafoto')
                          ,(u'Xataka ON', u'http://feeds.weblogssl.com/xatakaon')
                          ,(u'Xataka Ciencia', u'http://feeds.weblogssl.com/xatakaciencia')
                          ,(u'Genbeta', u'http://feeds.weblogssl.com/genbeta')
                          ,(u'Genbeta Dev', u'http://feeds.weblogssl.com/genbetadev')
                          ,(u'Genbeta Social Media', u'http://feeds.weblogssl.com/genbetasocialmedia')
                          ,(u'Applesfera', u'http://feeds.weblogssl.com/applesfera')
                          ,(u'Vida Extra', u'http://feeds.weblogssl.com/vidaextra')
                          ,(u'Naci\xf3n Red', u'http://feeds.weblogssl.com/nacionred')                          
                          ,(u'Blog de Cine', u'http://feeds.weblogssl.com/blogdecine')
                          ,(u'Vaya tele', u'http://feeds.weblogssl.com/vayatele2')
                          ,(u'Hipers\xf3nica', u'http://feeds.weblogssl.com/hipersonica')
                          ,(u'Diario del viajero', u'http://feeds.weblogssl.com/diariodelviajero')
                          ,(u'Papel en blanco', u'http://feeds.weblogssl.com/papelenblanco')
                          ,(u'Pop rosa', u'http://feeds.weblogssl.com/poprosa')
                          ,(u'Zona FandoM', u'http://feeds.weblogssl.com/zonafandom')
                          ,(u'Fandemia', u'http://feeds.weblogssl.com/fandemia')                          
                          ,(u'Tendencias', u'http://feeds.weblogssl.com/trendencias')
                          ,(u'Beb\xe9s y m\xe1s', u'http://feeds.weblogssl.com/bebesymas')
                          ,(u'Directo al paladar', u'http://feeds.weblogssl.com/directoalpaladar')
                          ,(u'Compradicci\xf3n', u'http://feeds.weblogssl.com/compradiccion')
                          ,(u'Decoesfera', u'http://feeds.weblogssl.com/decoesfera')
                          ,(u'Embelezzia', u'http://feeds.weblogssl.com/embelezzia')
                          ,(u'Vit\xf3nica', u'http://feeds.weblogssl.com/vitonica')
                          ,(u'Ambiente G', u'http://feeds.weblogssl.com/ambienteg')
                          ,(u'Tendencias Belleza', u'http://feeds.weblogssl.com/trendenciasbelleza')
                          ,(u'Tendencias Hombre', u'http://feeds.weblogssl.com/trendenciashombre')
                          ,(u'Peques y m\xe1s', u'http://feeds.weblogssl.com/pequesymas')
                          ,(u'Motorpasi\xf3n', u'http://feeds.weblogssl.com/motorpasion')
                          ,(u'Motorpasi\xf3n F1', u'http://feeds.weblogssl.com/motorpasionf1')
                          ,(u'Motorpasi\xf3n Moto', u'http://feeds.weblogssl.com/motorpasionmoto')
                          ,(u'Motorpasi\xf3n Futuro', u'http://feeds.weblogssl.com/motorpasionfuturo')
                          ,(u'Notas de futbol', u'http://feeds.weblogssl.com/notasdefutbol')
                          ,(u'Fuera de l\xedmites', u'http://feeds.weblogssl.com/fueradelimites')                          
                          ,(u'El blog salm\xf3n', u'http://feeds.weblogssl.com/elblogsalmon2')
                          ,(u'Pymes y aut\xf3nomos', u'http://feeds.weblogssl.com/pymesyautonomos')
                          ,(u'Tecnolog\xeda Pyme', u'http://feeds.weblogssl.com/tecnologiapyme')
                          ,(u'Ahorro diario', u'http://feeds.weblogssl.com/ahorrodiario')
                         ]
    

    keep_only_tags     = [dict(name='div', attrs={'id':'infoblock'}),
                          dict(name='div', attrs={'class':'post'}),
                          dict(name='div', attrs={'id':'blog-comments'})
                         ]

    remove_tags        = [dict(name='div', attrs={'id':'comment-nav'})]                         

    def print_version(self, url):
          return url.replace('http://www.', 'http://m.')

    preprocess_regexps = [     
                            # Para poner una linea en blanco entre un comentario y el siguiente
                           (re.compile(r'<li id="c', re.DOTALL|re.IGNORECASE), lambda match: '<br><br><li id="c')
                         ]

    # Para sustituir el video incrustado de YouTube por una imagen

    def preprocess_html(self, soup):
        for video_yt in soup.findAll('iframe',{'title':'YouTube video player'}):
            if video_yt:
               video_yt.name = 'img'
               fuente = video_yt['src']
               fuente2 = fuente.replace('http://www.youtube.com/embed/','http://img.youtube.com/vi/')
               fuente3 = fuente2.replace('?rel=0','') 
               video_yt['src'] = fuente3 + '/0.jpg'

        return soup

    # Para obtener la url original del articulo a partir de la de "feedsportal"
    # El siguiente código es gracias al usuario "bosplans" de www.mobileread.com
    # https://www.mobileread.com/forums/showthread.php?t=130297

    def get_article_url(self, article):
       link = article.get('link', None)
       if link is None:
           return article
       if link.split('/')[-1]=="story01.htm":
           link=link.split('/')[-2]
           a=['0B','0C','0D','0E','0F','0G','0N'  ,'0L0S','0A']
           b=['.' ,'/' ,'?' ,'-' ,'=' ,'&' ,'.com','www.','0']
           for i in range(0,len(a)):
              link=link.replace(a[i],b[i])
           link="http://"+link
       return link

elperiodicodearagon.com has recently updated its website and this has broken the old recipe, and I have made a new recipe for the new site.

Code:
#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__     = 'GPL v3'
__copyright__   = '04 December 2010, desUBIKado'
__author__      = 'desUBIKado'
__description__ = 'Daily newspaper from Aragon'
__version__     = 'v0.08'
__date__        = '13, November 2011'
'''
elperiodicodearagon.com
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe


class elperiodicodearagon(BasicNewsRecipe):
    title                 = u'El Periodico de Aragon'
    __author__            = u'desUBIKado'
    description           = u'Noticias desde Aragon'
    publisher             = u'elperiodicodearagon.com'
    category              = u'news, politics, Spain, Aragon'
    oldest_article        = 1
    delay                 = 0
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    language              = 'es'
    encoding              = 'iso-8859-1'
    remove_empty_feeds    = True
    remove_javascript     = True


    conversion_options = {
                             'comments'  : description
                            ,'tags'      : category
                            ,'language'  : language
                            ,'publisher' : publisher
                         }

    feeds              = [
                           (u'Portada', u'http://zetaestaticos.com/aragon/rss/portada_es.xml'),
                           (u'Arag\xf3n', u'http://zetaestaticos.com/aragon/rss/2_es.xml'),
                           (u'Internacional', u'http://zetaestaticos.com/aragon/rss/4_es.xml'),
                           (u'Espa\xf1a', u'http://zetaestaticos.com/aragon/rss/3_es.xml'),                          
                           (u'Econom\xeda', u'http://zetaestaticos.com/aragon/rss/5_es.xml'),
                           (u'Deportes', u'http://zetaestaticos.com/aragon/rss/7_es.xml'),
                           (u'Real Zaragoza', u'http://zetaestaticos.com/aragon/rss/10_es.xml'),
                           (u'CAI Zaragoza', u'http://zetaestaticos.com/aragon/rss/91_es.xml'),
                           (u'Monta\xf1ismo', u'http://zetaestaticos.com/aragon/rss/354_es.xml'),
                           (u'Opini\xf3n', u'http://zetaestaticos.com/aragon/rss/103_es.xml'),
                           (u'Tema del d\xeda', u'http://zetaestaticos.com/aragon/rss/102_es.xml'),
                           (u'Escenarios', u'http://zetaestaticos.com/aragon/rss/105_es.xml'),
                           (u'Sociedad', u'http://zetaestaticos.com/aragon/rss/104_es.xml'),
                           (u'Gente', u'http://zetaestaticos.com/aragon/rss/330_es.xml'),
                           (u'Espacio 3', u'http://zetaestaticos.com/aragon/rss/328_es.xml'),
                           (u'Fiestas del Pilar', u'http://zetaestaticos.com/aragon/rss/107_es.xml')
                         ]


    remove_attributes = ['height','width']

    keep_only_tags     = [dict(name='div', attrs={'id':'Noticia'})]                          


    # Recuperamos la portada de papel (la imagen format=1 tiene mayor resolucion)

    def get_cover_url(self):
        index = 'http://pdf.elperiodicodearagon.com/'
        soup = self.index_to_soup(index)
        for image in soup.findAll('img',src=True):
           if image['src'].startswith('http://pdf.elperiodicodearagon.com/funciones/portada-preview.php?eid='):
              return image['src'].rstrip('format=2') + 'format=1'
        return None    
       
    # Usamos la versión para móviles

    def print_version(self, url):
          return url.replace('http://www.elperiodicodearagon.com/', 'http://www.elperiodicodearagon.com/m/')
Bye.
desUBIKado is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Fixed recipes [pl] fenuks Recipes 0 11-13-2011 05:51 PM
Upgrade recipe for weblogssl (Spanish) desUBIKado Recipes 0 04-13-2011 05:31 PM
B&N fixed the 1.1 download file Ken Stuart Nook Color & Nook Tablet 1 03-17-2011 09:45 PM
Calibre & E505 best conversion recipes? Locheil Calibre 8 06-12-2009 09:53 AM
Calibre CPU Usage & German Recipes mutrata Calibre 5 04-13-2009 03:20 PM


All times are GMT -4. The time now is 07:06 PM.


MobileRead.com is a privately owned, operated and funded community.