Hi,
The recipe I made some month ago is not working anymore, since the newspaper implemented a new feed service called "feedportal.com" which screw the links to the proper articles, forwarding the visitor to ads ...
I figure out how to solve the problem in theory, but I do not know regex, unfortunately. The idea was to use the def print_version(self, url) and convert the following link from:
http://rss.feedsportal.com/c/32276/f...24ore0N0Cart0Cnotizie0C20A110E0A30E290Clampedusa0Eabitanti0Eoccupano0Emunicipio0E11570A60Bshtml0Duuid0FAauakRKD/story01.htm
To:
http://www.ilsole24ore.com/art/notizie/2011-03-29/lampedusa-abitanti-occupano-municipio-115706_PRN.shtml
As you can see in the former article link there are all the info needed for the conversion ... but I have no idea how to make the magic! Someone can help me or tell where to find a good resource to learn the principles? Is it possible or there are easier workaround?
Thanks in advance!
The original recipe:
Code:
__author__ = 'Marco Saraceno'
__copyright__ = '2010, Marco Saraceno <marcosaraceno at gmail.com>'
description = 'Italian daily newspaper - v 1.1 (Mar14,2011)'
'''
http://www.ilsole24ore.com
'''
class IlSole24Ore(BasicNewsRecipe):
__author__ = 'Marco Saraceno'
description = 'Italian financial daily newspaper'
cover_url = 'http://www.shopping24.ilsole24ore.com/ProductRelated/rds/img/logo_sole.gif'
title = u'Il Sole 24 Ore'
publisher = 'Gruppo editoriale GRUPPO 24ORE'
category = 'News, politics, culture, economy, financial, Italian'
language = 'it'
timefmt = '[%a, %d %b, %Y]'
oldest_article = 2
max_articles_per_feed = 100
use_embedded_content = False
recursion = 10
extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
remove_tags = [
dict(name='div', attrs={'class':['header','titolo']}),
dict(name='table', attrs={'class':['footer1024','footerdown']}),
]
feeds = [
(u'Notizie Italia', u'http://www.ilsole24ore.com/rss/notizie/italia.xml'),
(u'Notizie Europa', u'http://www.ilsole24ore.com/rss/notizie/europa.xml'),
(u'Notizie USA', u'http://www.ilsole24ore.com/rss/notizie/usa.xml'),
(u'Notizie Americhe', u'http://www.ilsole24ore.com/rss/notizie/americhe.xml'),
(u'Notizie Medio Oriente e Africa', u'http://www.ilsole24ore.com/rss/notizie/medio-oriente-e-africa.xml'),
(u'Notizie Asia e Oceania', u'http://www.ilsole24ore.com/rss/notizie/asia-e-oceania.xml'),
(u'Commenti', u'http://www.ilsole24ore.com/rss/commenti-e-idee.xml'),
(u'Norme e tributi', u'http://www.ilsole24ore.com/rss/norme-e-tributi.xml'),
(u'Finanza', u'http://www.ilsole24ore.com/rss/finanza-e-mercati.xml'),
(u'Economia', u'http://www.ilsole24ore.com/rss/economia.xml'),
(u'Tecnologia', u'http://www.ilsole24ore.com/rss/tecnologie.xml'),
(u'Cultura', u'http://www.ilsole24ore.com/rss/cultura.xml'),
]
def print_version(self, url):
return url.replace('.shtml', '_PRN.shtml')