View Single Post
Old 02-12-2012, 06:12 AM   #1
clanger9
Member
clanger9 doesn't litterclanger9 doesn't litter
 
Posts: 11
Karma: 138
Join Date: Nov 2010
Device: Kindle 3
Kurier recipe update

The Kurier website has been revamped, meaning that the calibre recipe should be updated.

Suggested patch to fix character encoding issues and structural changes, below:

Code:
*** kurier.recipe	Mon Feb  6 07:47:02 2012
--- kurier.recipe.orig	Sat Feb  4 17:43:04 2012
***************
*** 13,22 ****
      publisher             = 'KURIER'
      category              = 'news, politics, Austria'
      oldest_article        = 2
!     max_articles_per_feed = 100
!     timeout               = 30
!     encoding              = None
      no_stylesheets        = True
      use_embedded_content  = False
      language              = 'de_AT'
      remove_empty_feeds    = True
--- 13,21 ----
      publisher             = 'KURIER'
      category              = 'news, politics, Austria'
      oldest_article        = 2
!     max_articles_per_feed = 200
      no_stylesheets        = True
+     encoding              = 'cp1252'
      use_embedded_content  = False
      language              = 'de_AT'
      remove_empty_feeds    = True
***************
*** 30,40 ****
                          , 'language'  : language
                          }
  
!     remove_tags = [ dict(attrs={'id':['artikel_expand_symbol2','imgzoom_close2']}), 
!                     dict(attrs={'class':['linkextern','functionsleiste','functions','social_positionierung','contenttabs','drucken','versenden','leserbrief','kommentieren','addthis_button']})
!                    ]
      keep_only_tags    = [dict(attrs={'id':'content'})]
!     remove_tags_after = [dict(attrs={'id':'author'})]
      remove_attributes = ['width','height']
  
      feeds = [
--- 29,37 ----
                          , 'language'  : language
                          }
  
!     remove_tags = [dict(attrs={'class':['functionsleiste','functions','social_positionierung','contenttabs','drucken','versenden','leserbrief','kommentieren','addthis_button']})]
      keep_only_tags    = [dict(attrs={'id':'content'})]
!     remove_tags_after = dict(attrs={'id':'author'})
      remove_attributes = ['width','height']
  
      feeds = [
***************
*** 44,50 ****
                ,(u'Kultur'     , u'http://kurier.at/rss/kultur_kultur_rss.xml'   )
                ,(u'Freizeit'   , u'http://kurier.at/rss/freizeit_freizeit_rss.xml'   )
                ,(u'Wetter'     , u'http://kurier.at/rss/oewetter_rss.xml'   )
!               ,(u'Sport'      , u'http://kurier.at/newsfeed/detail/sport_rss.xml'   )
              ]
  
      def preprocess_html(self, soup):
--- 41,47 ----
                ,(u'Kultur'     , u'http://kurier.at/rss/kultur_kultur_rss.xml'   )
                ,(u'Freizeit'   , u'http://kurier.at/rss/freizeit_freizeit_rss.xml'   )
                ,(u'Wetter'     , u'http://kurier.at/rss/oewetter_rss.xml'   )
!               ,(u'Verkehr'    , u'http://kurier.at/rss/verkehr_rss.xml'   )
              ]
  
      def preprocess_html(self, soup):
clanger9 is offline   Reply With Quote