Thread: Fix a recipe
View Single Post
Old 03-29-2011, 08:49 AM   #1
bosplans
Member
bosplans began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Mar 2011
Device: kindle 3
Fix a recipe

Hi,

The recipe I made some month ago is not working anymore, since the newspaper implemented a new feed service called "feedportal.com" which screw the links to the proper articles, forwarding the visitor to ads ...

I figure out how to solve the problem in theory, but I do not know regex, unfortunately. The idea was to use the def print_version(self, url) and convert the following link from:

http://rss.feedsportal.com/c/32276/f...24ore0N0Cart0Cnotizie0C20A110E0A30E290Clampedusa0Eabitanti0Eoccupano0Emunicipio0E11570A60Bshtml0Duuid0FAauakRKD/story01.htm

To:

http://www.ilsole24ore.com/art/notizie/2011-03-29/lampedusa-abitanti-occupano-municipio-115706_PRN.shtml

As you can see in the former article link there are all the info needed for the conversion ... but I have no idea how to make the magic! Someone can help me or tell where to find a good resource to learn the principles? Is it possible or there are easier workaround?

Thanks in advance!

The original recipe:
Code:
__author__    = 'Marco Saraceno'
__copyright__ = '2010, Marco Saraceno <marcosaraceno at gmail.com>'
description   = 'Italian daily newspaper - v 1.1 (Mar14,2011)'

'''
http://www.ilsole24ore.com
'''

class IlSole24Ore(BasicNewsRecipe):
    __author__        = 'Marco Saraceno'
    description   = 'Italian financial daily newspaper'

    cover_url      = 'http://www.shopping24.ilsole24ore.com/ProductRelated/rds/img/logo_sole.gif'
    title          = u'Il Sole 24 Ore'
    publisher      = 'Gruppo editoriale GRUPPO 24ORE'
    category       = 'News, politics, culture, economy, financial, Italian'

    language       = 'it'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 2
    max_articles_per_feed = 100
    use_embedded_content  = False
    recursion             = 10
    extra_css      = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt  }'

         
    remove_tags = [
                            dict(name='div', attrs={'class':['header','titolo']}),
                            dict(name='table', attrs={'class':['footer1024','footerdown']}),
                           ]

    feeds = [
                  (u'Notizie Italia', u'http://www.ilsole24ore.com/rss/notizie/italia.xml'),
				  (u'Notizie Europa', u'http://www.ilsole24ore.com/rss/notizie/europa.xml'),
				  (u'Notizie USA', u'http://www.ilsole24ore.com/rss/notizie/usa.xml'),
				  (u'Notizie Americhe', u'http://www.ilsole24ore.com/rss/notizie/americhe.xml'),
				  (u'Notizie Medio Oriente e Africa', u'http://www.ilsole24ore.com/rss/notizie/medio-oriente-e-africa.xml'),
				  (u'Notizie Asia e Oceania', u'http://www.ilsole24ore.com/rss/notizie/asia-e-oceania.xml'),
                  (u'Commenti', u'http://www.ilsole24ore.com/rss/commenti-e-idee.xml'),
                  (u'Norme e tributi', u'http://www.ilsole24ore.com/rss/norme-e-tributi.xml'),
                  (u'Finanza', u'http://www.ilsole24ore.com/rss/finanza-e-mercati.xml'),
                  (u'Economia', u'http://www.ilsole24ore.com/rss/economia.xml'),
                  (u'Tecnologia', u'http://www.ilsole24ore.com/rss/tecnologie.xml'),
                  (u'Cultura', u'http://www.ilsole24ore.com/rss/cultura.xml'),
                ]

    def print_version(self, url):
          return url.replace('.shtml', '_PRN.shtml')
bosplans is offline   Reply With Quote