Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-27-2011, 11:54 AM   #1
euleralves
Junior Member
euleralves began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2011
Device: Kindle 3
Post Upgrade recipe for Folha de São Paulo and Estadão with cover

Sorry about credits. Of course this work doesn't was possible without instructions of all members.

This recipe take the Folha de São Paulo First Page image to cover. If it get a error, then it use www.thumbalizr.com to get a webshot. The Estadão always delay the First Page image.

Desculpe sobre os créditos, é claro que este trabalho não teria sido possível sem instruções de todos os membros.

Esta receita baixa a Primeira Página original da Folha de São Paulo como capa. Se ocorrer um erro, então usará www.thumbalizr.com para obter um screenshot da página. O Estadão sempre atrasa a imagem da Primeira Página.


Code:
######################
# Folha de São Paulo #
######################
from calibre.web.feeds.news import BasicNewsRecipe
from datetime import datetime, timedelta
from calibre.ebooks.BeautifulSoup import Tag,BeautifulSoup
from calibre.utils.magick import Image, PixelWand
from urllib2 import Request, urlopen, URLError
 
class FolhaOnline(BasicNewsRecipe):
    THUMBALIZR_API        = "0123456789abcdef01234567890" # ---->Get your at http://www.thumbalizr.com/
    LANGUAGE              = 'pt_br'
    LANGHTM               = 'pt-br'
    ENCODING              = 'cp1252'
    ENCHTM                = 'iso-8859-1'
    directionhtm          = 'ltr'
    requires_version      = (0,8,47)
    news                  = True
    publication_type      = 'newsportal'
 
    title                 = u'Folha de S\xE3o Paulo'
    __author__            = 'Euler Alves'
    description           = u'Brazilian news from Folha de S\xE3o Paulo'
    publisher             = u'Folha de S\xE3o Paulo'
    category              = 'news, rss'
 
    oldest_article        = 4
    max_articles_per_feed = 100
    summary_length        = 1000
 
    remove_javascript     = True
    no_stylesheets        = True
    use_embedded_content  = False
    remove_empty_feeds    = True
    timefmt               = ' [%d %b %Y (%a)]'
 
    html2lrf_options      = [
                            '--comment', description
                            ,'--category', category
                            ,'--publisher', publisher
    ]
 
    html2epub_options     = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'
 
    hoje                  = datetime.now()
    pubdate               = hoje.strftime('%a, %d %b')
    if hoje.hour<6:
        hoje = hoje-timedelta(days=1)
    CAPA                  = 'http://www1.folha.uol.com.br/fsp/images/cp'+hoje.strftime('%d%m%Y')+'.jpg'
    SCREENSHOT            = 'http://www1.folha.uol.com.br/'
    cover_margins         = (0,0,'white')
    masthead_url          = 'http://f.i.uol.com.br/fsp/furniture/images/lgo-fsp-430x50-ffffff.gif'
 
    keep_only_tags      = [dict(name='div', attrs={'id':'articleNew'})]
    remove_tags         = [
                        dict(name='div',
                            attrs={'id':[
                                'articleButton'
                                ,'bookmarklets'
                                ,'ad-180x150-1'
                                ,'contextualAdsArticle'
                                ,'articleEnd'
                                ,'articleComments'
                            ]})
                        ,dict(name='div',
                            attrs={'class':[
                            'openBox adslibraryArticle'
                            ]})
 
                        ,dict(name='a')
                        ,dict(name='iframe')
                        ,dict(name='link')
                        ,dict(name='script')
    ]
 
    feeds = [
    (u'Em cima da hora', u'http://feeds.folha.uol.com.br/emcimadahora/rss091.xml')
    ,(u'Ambiente', u'http://feeds.folha.uol.com.br/ambiente/rss091.xml')
    ,(u'Bichos', u'http://feeds.folha.uol.com.br/bichos/rss091.xml')
    ,(u'Ci\xEAncia', u'http://feeds.folha.uol.com.br/ciencia/rss091.xml')
    ,(u'Poder', u'http://feeds.folha.uol.com.br/poder/rss091.xml')
    ,(u'Equil\xEDbrio e Sa\xFAde', u'http://feeds.folha.uol.com.br/equilibrioesaude/rss091.xml')
    ,(u'Turismo', u'http://feeds.folha.uol.com.br/folha/turismo/rss091.xml')
    ,(u'Mundo', u'http://feeds.folha.uol.com.br/mundo/rss091.xml')
    ,(u'Pelo Mundo', u'http://feeds.folha.uol.com.br/pelomundo.folha.rssblog.uol.com.br/')
    ,(u'Circuito integrado', u'http://feeds.folha.uol.com.br/circuitointegrado.folha.rssblog.uol.com.br/')
    ,(u'Blog do Fred', u'http://feeds.folha.uol.com.br/blogdofred.folha.rssblog.uol.com.br/')
    ,(u'Maria In\xEAs Dolci', u'http://feeds.folha.uol.com.br/mariainesdolci.folha.blog.uol.com.br/')
    ,(u'Eduardo Ohata', u'http://feeds.folha.uol.com.br/folha/pensata/eduardoohata/rss091.xml')
    ,(u'Kennedy Alencar', u'http://feeds.folha.uol.com.br/folha/pensata/kennedyalencar/rss091.xml')
    ,(u'Eliane Catanh\xEAde', u'http://feeds.folha.uol.com.br/folha/pensata/elianecantanhede/rss091.xml')
    ,(u'Fernado Canzian', u'http://feeds.folha.uol.com.br/folha/pensata/fernandocanzian/rss091.xml')
    ,(u'Gilberto Dimenstein', u'http://feeds.folha.uol.com.br/folha/pensata/gilbertodimenstein/rss091.xml')
    ,(u'H\xE9lio Schwartsman', u'http://feeds.folha.uol.com.br/folha/pensata/helioschwartsman/rss091.xml')
    ,(u'Jo\xE3o Pereira Coutinho', u'http://http://feeds.folha.uol.com.br/folha/pensata/joaopereiracoutinho/rss091.xml')
    ,(u'Luiz Caversan', u'http://http://feeds.folha.uol.com.br/folha/pensata/luizcaversan/rss091.xml')
    ,(u'S\xE9rgio Malbergier', u'http://http://feeds.folha.uol.com.br/folha/pensata/sergiomalbergier/rss091.xml')
    ,(u'Valdo Cruz', u'http://http://feeds.folha.uol.com.br/folha/pensata/valdocruz/rss091.xml')
    ]
 
    conversion_options = {
    'title'            : title
    ,'comments'        : description
    ,'publisher'       : publisher
    ,'tags'            : category
    ,'language'        : LANGUAGE
    ,'linearize_tables': True
    }
 
    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        if not soup.find(attrs={'http-equiv':'Content-Language'}):
            meta0 = Tag(soup,'meta',[("http-equiv","Content-Language"),("content",self.LANGHTM)])
            soup.head.insert(0,meta0)
        if not soup.find(attrs={'http-equiv':'Content-Type'}):
            meta1 = Tag(soup,'meta',[("http-equiv","Content-Type"),("content","text/html; charset="+self.ENCHTM)])
            soup.head.insert(0,meta1)
        return soup
 
    def postprocess_html(self, soup, first):
        #process all the images. assumes that the new html has the correct path
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            img = Image()
            img.open(iurl)
            width, height = img.size
            print 'img is: ', iurl, 'width is: ', width, 'height is: ', height
            if img < 0:              raise RuntimeError('Out of memory')             pw = PixelWand()            if( width > height and width > 590) :
                print 'Rotate image'
                img.rotate(pw, -90)
                img.save(iurl)
        return soup
 
    def get_cover_url(self):
        cover_url      = self.CAPA
        pedido         = Request(self.CAPA)
        pedido.add_header('User-agent','Mozilla/5.0 (Windows; U; Windows NT 5.1; '+self.LANGHTM+'; userid='+self.THUMBALIZR_API+') Calibre/0.8.47 (like Gecko)')
        pedido.add_header('Accept-Charset',self.ENCHTM)
        pedido.add_header('Referer',self.SCREENSHOT)
        try:
            resposta   = urlopen(pedido)
            soup       = BeautifulSoup(resposta)
            cover_item = soup.find('body')
            if cover_item:
                cover_url='http://api.thumbalizr.com/?api_key='+self.THUMBALIZR_API+'&url='+self.SCREENSHOT+'&width=600&quality=90'
            return cover_url
        except URLError:
            cover_url='http://api.thumbalizr.com/?api_key='+self.THUMBALIZR_API+'&url='+self.SCREENSHOT+'&width=600&quality=90'
            return cover_url
Code:
###################
#     Estadão     #
###################
from calibre.web.feeds.news import BasicNewsRecipe
from datetime import datetime, timedelta
from calibre.ebooks.BeautifulSoup import Tag,BeautifulSoup
from calibre.utils.magick import Image, PixelWand
from urllib2 import Request, urlopen, URLError
 
class Estadao(BasicNewsRecipe):
    THUMBALIZR_API        = "0123456789abcdef01234567890" # ---->Get your at http://www.thumbalizr.com/
    LANGUAGE              = 'pt_br'
    LANGHTM               = 'pt-br'
    ENCODING              = 'utf'
    ENCHTM                = 'utf-8'
    directionhtm          = 'ltr'
    requires_version      = (0,8,47)
    news                  = True
    publication_type      = 'newsportal'
 
    title                 = u'Estadao'
    __author__            = 'Euler Alves'
    description           = u'Brazilian news from Estad\xe3o'
    publisher             = u'Estad\xe3o'
    category              = 'news, rss'
 
    oldest_article        = 4
    max_articles_per_feed = 100
    summary_length        = 1000
 
    remove_javascript     = True
    no_stylesheets        = True
    use_embedded_content  = False
    remove_empty_feeds    = True
    timefmt               = ' [%d %b %Y (%a)]'
 
    html2lrf_options      = [
                            '--comment', description
                            ,'--category', category
                            ,'--publisher', publisher
    ]
 
    html2epub_options     = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'
 
    hoje                  = datetime.now()-timedelta(days=2)
    pubdate               = hoje.strftime('%a, %d %b')
    if hoje.hour<10:
        hoje = hoje-timedelta(days=1)
    CAPA                  = 'http://www.estadao.com.br/estadaodehoje/'+hoje.strftime('%Y%m%d')+'/img/capadodia.jpg'
    SCREENSHOT            = 'http://estadao.com.br/'
    cover_margins         = (0,0,'white')
    masthead_url          = 'http://www.estadao.com.br/estadao/novo/img/logo.png'
 
    keep_only_tags = [dict(name='div', attrs={'class':['bb-md-noticia','corpo']})]
    remove_tags = [
                    dict(name='div',
                        attrs={'id':[
                            'bb-md-noticia-tabs'
                        ]})
                    ,dict(name='div',
                        attrs={'class':[
                            'tags'
                            ,'discussion'
                            ,'bb-gg adsense_container'
                        ]})
 
                    ,dict(name='a')
                    ,dict(name='iframe')
                    ,dict(name='link')
                    ,dict(name='script')
    ]
 
    feeds = [
    (u'\xDAltimas Not\xEDcias', u'http://www.estadao.com.br/rss/ultimas.xml')
    ,(u'Manchetes', u'http://www.estadao.com.br/rss/manchetes.xml')
    ,(u'Brasil', u'http://www.estadao.com.br/rss/brasil.xml')
    ,(u'Internacional', u'http://www.estadao.com.br/rss/internacional.xml')
    ,(u'Cinema', u'http://blogs.estadao.com.br/cinema/feed/')
    ,(u'Planeta', u'http://www.estadao.com.br/rss/planeta.xml')
    ,(u'Ci\xEAncia', u'http://www.estadao.com.br/rss/ciencia.xml')
    ,(u'Sa\xFAde', u'http://www.estadao.com.br/rss/saude.xml')
    ,(u'Pol\xEDtica', u'http://www.estadao.com.br/rss/politica.xml')
    ]
 
    conversion_options = {
    'title'            : title
    ,'comments'        : description
    ,'publisher'       : publisher
    ,'tags'            : category
    ,'language'        : LANGUAGE
    ,'linearize_tables': True
    }
 
    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        if not soup.find(attrs={'http-equiv':'Content-Language'}):
            meta0 = Tag(soup,'meta',[("http-equiv","Content-Language"),("content",self.LANGHTM)])
            soup.head.insert(0,meta0)
        if not soup.find(attrs={'http-equiv':'Content-Type'}):
            meta1 = Tag(soup,'meta',[("http-equiv","Content-Type"),("content","text/html; charset="+self.ENCHTM)])
            soup.head.insert(0,meta1)
        return soup
 
    def postprocess_html(self, soup, first):
        #process all the images. assumes that the new html has the correct path
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            img = Image()
            img.open(iurl)
            width, height = img.size
            print 'img is: ', iurl, 'width is: ', width, 'height is: ', height
            if img < 0:              raise RuntimeError('Out of memory')             pw = PixelWand()            if( width > height and width > 590) :
                print 'Rotate image'
                img.rotate(pw, -90)
                img.save(iurl)
        return soup
 
    def get_cover_url(self):
        cover_url      = self.CAPA
        pedido         = Request(self.CAPA)
        pedido.add_header('User-agent','Mozilla/5.0 (Windows; U; Windows NT 5.1; '+self.LANGHTM+'; userid='+self.THUMBALIZR_API+') Calibre/0.8.47 (like Gecko)')
        pedido.add_header('Accept-Charset',self.ENCHTM)
        pedido.add_header('Referer',self.SCREENSHOT)
        try:
            resposta   = urlopen(pedido)
            soup       = BeautifulSoup(resposta)
            cover_item = soup.find('body')
            if cover_item:
                cover_url='http://api.thumbalizr.com/?api_key='+self.THUMBALIZR_API+'&url='+self.SCREENSHOT+'&width=600&quality=90'
            return cover_url
        except URLError:
            cover_url='http://api.thumbalizr.com/?api_key='+self.THUMBALIZR_API+'&url='+self.SCREENSHOT+'&width=600&quality=90'
            return cover_url
euleralves is offline   Reply With Quote
Old 03-27-2011, 07:33 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,813
Karma: 4369673
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Thanks, updated.h
kovidgoyal is offline   Reply With Quote
 
Enthusiast
Old 03-28-2011, 06:12 PM   #3
euleralves
Junior Member
euleralves began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2011
Device: Kindle 3
Unhappy Wrong line. Linha errada

Thank you! You who are the great artist.
I put a line wrong.
Please, correct.
It will be better if you edit my post.

WRONG:
Code:
THUMBALIZR_API        = "0123456789abcdef01234567890" # ---->Get your at http://www.thumbalizr.com/
CORRECT
Code:
THUMBALIZR_API        = '' # ---->Get your at http://www.thumbalizr.com/ and put here
euleralves is offline   Reply With Quote
Old 03-30-2011, 07:41 PM   #4
euleralves
Junior Member
euleralves began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2011
Device: Kindle 3
Wink Complete recipe files, favicons and enhanced LifeHacker, Folha de São Paulo, Estadão

Kovid, I stayed know today that you need favicons. It's in zip attached.

I took oportunity to add my enhanced LifeHacker recipe.

All this recipes use thumbalizr.com to get a webshot and use for cover.
Attached Files
File Type: zip Recipe Estadao 2011-03-30.zip (2.8 KB, 46 views)
File Type: zip Recipe Folha 2011-03-30.zip (4.4 KB, 46 views)
File Type: zip Recipe LifeHacker 2011-03-30.zip (6.4 KB, 38 views)
euleralves is offline   Reply With Quote
Old 03-31-2011, 01:02 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,813
Karma: 4369673
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Updated.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
spiegelde.recipe: add cover miwie Recipes 0 11-20-2010 08:58 AM
Hello from Sao Paulo, Brazil jglerner Introduce Yourself 8 02-17-2010 01:33 PM
Best selling author Paulo Coelho publishes stories on Feedbooks Hadrien Deals, Freebies, and Resources (No Self-Promotion) 2 05-16-2009 06:37 PM
Hi from Sao Paulo! lorisgirl Introduce Yourself 4 03-18-2009 12:08 PM
Paulo Coelho gives 'em for free ricdiogo News 4 01-26-2008 10:57 AM


All times are GMT -4. The time now is 01:42 PM.


MobileRead.com is a privately owned, operated and funded community.