View Single Post
Old 01-09-2011, 12:45 PM   #24
paola
Wizard
paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.paola ought to be getting tired of karma fortunes by now.
 
paola's Avatar
 
Posts: 2,841
Karma: 5843878
Join Date: Oct 2010
Location: UK
Device: Pocketbook Pro 903, (beloved Pocketbook 360 RIP), Kobo Mini, Kobo Aura
Quote:
Originally Posted by review View Post
@paolamanzini: Don't worry, I didn't take it the wrong way.

Since I need to make some little adjustments for each RSS feed (this is what I called earlier recipe) please feel free to just post the newspages you would like to read on your 903. As time permits I will try to integrate them. Will keep you updated on the progress then.
that is very kind - if I only had to pick one, it would be this one:
http://xml.corriereobjects.it/rss/homepage.xml
I am not too sure whether this is the Calibre recipe, but from what I can read in Calibre I get:
Quote:
#!/usr/bin/env python
__license__ = 'GPL v3'
__author__ = 'Lorenzo Vigentini, based on Darko Miletic'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>, Lorenzo Vigentini <l.vigentini at gmail.com>'
__version__ = 'v1.01'
__date__ = '10, January 2010'
__description__ = 'Italian daily newspaper'

'''
http://www.corriere.it/
'''
import time
from calibre.web.feeds.news import BasicNewsRecipe

class ilCorriere(BasicNewsRecipe):
__author__ = 'Lorenzo Vigentini, based on Darko Miletic, Gabriele Marini'
description = 'Italian daily newspaper'

# cover_url = 'http://images.corriereobjects.it/images/static/common/logo_home.gif?v=200709121520


title = u'Il Corriere della sera'
publisher = 'RCS Digital'
category = 'News, politics, culture, economy, general interest'

encoding = 'cp1252'
language = 'it'
timefmt = '[%a, %d %b, %Y]'

oldest_article = 10
max_articles_per_feed = 100
use_embedded_content = False
recursion = 10

remove_javascript = True
no_stylesheets = True

html2lrf_options = [
'--comment', description
, '--category', category
, '--publisher', publisher
, '--ignore-tables'
]

html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"\nlinearize_tables=True'

keep_only_tags = [dict(name='div', attrs={'class':['news-dettaglio article','article']})]

remove_tags = [
dict(name=['base','object','link','embed']),
dict(name='div', attrs={'class':'news-goback'}),
dict(name='ul', attrs={'class':'toolbar'})
]

remove_tags_after = dict(name='p', attrs={'class':'footnotes'})

def get_cover_url(self):
cover = None
st = time.localtime()
year = str(st.tm_year)
month = "%.2d" % st.tm_mon
day = "%.2d" % st.tm_mday
#http://images.corriere.it/primapagin...ina_grande.png
cover='http://images.corriere.it/primapagina/storico/'+ year + '_' + month +'_' + day +'/images/prima_pagina_grande.png'
br = BasicNewsRecipe.get_browser()
try:
br.open(cover)
except:
self.log("\nCover unavailable")
cover ='http://images.corriereobjects.it/images/static/common/logo_home.gif?v=200709121520'
return cover

feeds = [
(u'Ultimora' , u'http://www.corriere.it/rss/ultimora.xml' ),
(u'Editoriali' , u'http://www.corriere.it/rss/editoriali.xml'),
(u'Cronache' , u'http://www.corriere.it/rss/cronache.xml' ),
(u'Politica' , u'http://www.corriere.it/rss/politica.xml' ),
(u'Esteri' , u'http://www.corriere.it/rss/esteri.xml' ),
(u'Economia' , u'http://www.corriere.it/rss/economia.xml' ),
(u'Cultura' , u'http://www.corriere.it/rss/cultura.xml' ),
(u'Scienze' , u'http://www.corriere.it/rss/scienze.xml' ),
(u'Salute' , u'http://www.corriere.it/rss/salute.xml' ),
(u'Spettacolo' , u'http://www.corriere.it/rss/spettacoli.xml'),
(u'Cinema e TV', u'http://www.corriere.it/rss/cinema.xml' ),
(u'Sport' , u'http://www.corriere.it/rss/sport.xml' ),
(u'Roma' , u'http://www.corriere.it/rss/homepage_roma.xml'),
(u'Milano' , u'http://www.corriere.it/rss/homepage_milano.xml')
]
But there is no rush, I imagine I could put this into your app once it sees the light

And in the meantime, good thing that Calibre exists!
paola is offline   Reply With Quote