I made a nice recipe and it worked since so far.
A couple of days ago the news source changed the feed structure using feedportal as service provider.
Now my former recipes does not work anymore ;-)
I figure out the problem is they use obscure link now, so I guess it should be possible to use use the def_print option with some regex on the url. Unfortunatly I do not know reg ex!
I should be able to convert following link strutture, from:
'http://rss.feedsportal.com/c/32276/f/566660/s/13b7117a/l/0L0Silsole24ore0N0Cart0C
notizie0C20A110E0A30E290Clampedusa0Eabitanti0Eoccu pano0Emunicipio0E11570A60Bshtml0Duuid0FAauakRKD/story01.htm
to:
'http://www.ilsole24ore.com/art
/notizie/2011-03-29/lampedusa-abitanti-occupano-municipio-115706_PRN.shtml
The first part of the link is static, the dynamic part is the bold one. I know in the first link there are all the infos needed, but I can not figure out the code
Any help?
Here the former recipes:
Code:
__author__ = 'Marco Saraceno'
__copyright__ = '2010, Marco Saraceno <marcosaraceno at gmail.com>'
description = 'Italian daily newspaper - v 1.1 (Mar14,2011)'
'''
http://www.ilsole24ore.com
'''
class IlSole24Ore(BasicNewsRecipe):
__author__ = 'Marco Saraceno'
description = 'Italian financial daily newspaper'
cover_url = 'http://www.shopping24.ilsole24ore.com/ProductRelated/rds/img/logo_sole.gif'
title = u'Il Sole 24 Ore'
publisher = 'Gruppo editoriale GRUPPO 24ORE'
category = 'News, politics, culture, economy, financial, Italian'
language = 'it'
timefmt = '[%a, %d %b, %Y]'
oldest_article = 2
max_articles_per_feed = 100
use_embedded_content = False
recursion = 2
extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
remove_tags = [
dict(name='div', attrs={'class':['header','titolo']}),
dict(name='table', attrs={'class':['footer1024','footerdown']}),
]
feeds = [
(u'Notizie Italia', u'http://www.ilsole24ore.com/rss/notizie/italia.xml'),
(u'Notizie Europa', u'http://www.ilsole24ore.com/rss/notizie/europa.xml'),
(u'Notizie USA', u'http://www.ilsole24ore.com/rss/notizie/usa.xml'),
(u'Notizie Americhe', u'http://www.ilsole24ore.com/rss/notizie/americhe.xml'),
(u'Notizie Medio Oriente e Africa', u'http://www.ilsole24ore.com/rss/notizie/medio-oriente-e-africa.xml'),
(u'Notizie Asia e Oceania', u'http://www.ilsole24ore.com/rss/notizie/asia-e-oceania.xml'),
(u'Commenti', u'http://www.ilsole24ore.com/rss/commenti-e-idee.xml'),
(u'Norme e tributi', u'http://www.ilsole24ore.com/rss/norme-e-tributi.xml'),
(u'Finanza', u'http://www.ilsole24ore.com/rss/finanza-e-mercati.xml'),
(u'Economia', u'http://www.ilsole24ore.com/rss/economia.xml'),
(u'Tecnologia', u'http://www.ilsole24ore.com/rss/tecnologie.xml'),
(u'Cultura', u'http://www.ilsole24ore.com/rss/cultura.xml'),
]
def print_version(self, url):
return url.replace('.shtml', '_PRN.shtml')