Delete News Sections

oddeyed · 03-02-2010, 02:07 PM

Hi everyone,

So when I do eventually take the plunge and get a reader, I want to be able to access news feeds on it.

I would use the feedbooks self-updating automagical Newspapers, but seeing as my soon-to-be-reader, even if it is a Kindle, won't have that functionality (I'm not in the US

) I thought I'd use the Calibre news output because it can have pictures

.

I have been doing test creations with the Guardian feeds on my laptop, and without fail, they include the sports section, G2 and the Entertainment which I don't want, even though I have edited the script to not get these.

So, does anyone know how to stop this from happening?

Thanks,
oddeyed

Below is my custom recipe:

Spoiler:

Code:

#!/usr/bin/env  python
__license__   = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'

'''
www.guardian.co.uk
'''
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class Guardian(BasicNewsRecipe):

    title = u'The Guardian - Top Stories'
    language = 'en_GB'

    oldest_article = 2
    max_articles_per_feed = 5
    remove_javascript = True

    timefmt = ' [%a, %d %b %Y]'
    keep_only_tags = [
                      dict(name='div', attrs={'id':["content","article_header","main-article-info",]}),
                           ]
    remove_tags = [
                        dict(name='div', attrs={'class':["video-content","videos-third-column"]}),
                        dict(name='div', attrs={'id':["article-toolbox","subscribe-feeds",]}),
                        dict(name='ul', attrs={'class':["pagination"]}),
                        dict(name='ul', attrs={'id':["content-actions"]}),
                        ]
    use_embedded_content    = False

    no_stylesheets = True
    extra_css = '''
                    .article-attributes{font-size: x-small; font-family:Arial,Helvetica,sans-serif;}
                    .h1{font-size: large ;font-family:georgia,serif; font-weight:bold;}
                    .stand-first-alone{color:#666666; font-size:small; font-family:Arial,Helvetica,sans-serif;}
                    .caption{color:#666666; font-size:x-small; font-family:Arial,Helvetica,sans-serif;}
                    #article-wrapper{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
                    .main-article-info{font-family:Arial,Helvetica,sans-serif;}
                    #full-contents{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
                    #match-stats-summary{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
                '''

    feeds = [
        ('Top Stories','http://www.guardian.co.uk/theguardian/mainsection/topstories/rss'
         ),
        ]

    def get_article_url(self, article):
          url = article.get('guid', None)
          if '/video/' in url or '/flyer/' in url or '/quiz/' in url or \
              '/gallery/' in url  or 'ivebeenthere' in url or \
              'pickthescore' in url or 'audioslideshow' in url or \
	    '/sport' in url or 'educationguardian' in url or 'football'\
	    or  '/films' in url:
              url = None
          return url

    def preprocess_html(self, soup):

          for item in soup.findAll(style=True):
              del item['style']

          for item in soup.findAll(face=True):
              del item['face']
          for tag in soup.findAll(name=['ul','li']):
                tag.name = 'div'

          return soup

    def find_sections(self):
        soup = self.index_to_soup('http://www.guardian.co.uk/theguardian')
        # find cover pic
        img = soup.find( 'img',attrs ={'alt':'Guardian digital edition'})
        if img is not None:
            self.cover_url = img['src']
        # end find cover pic

        idx = soup.find('div', id='book-index')
        for s in idx.findAll('strong', attrs={'class':'book'}):
            a = s.find('a', href=True)
            yield (self.tag_to_string(a), a['href'])

    def find_articles(self, url):
        soup = self.index_to_soup(url)
        div = soup.find('div', attrs={'class':'book-index'})
        for ul in div.findAll('ul', attrs={'class':'trailblock'}):
            for li in ul.findAll('li'):
                a = li.find(href=True)
                if not a:
                    continue
                title = self.tag_to_string(a)
                url = a['href']
                if not title or not url:
                    continue
                tt = li.find('div', attrs={'class':'trailtext'})
                if tt is not None:
                    for da in tt.findAll('a'): da.extract()
                    desc = self.tag_to_string(tt).strip()
                yield {
                        'title': title, 'url':url, 'description':desc,
                        'date' : strftime('%a, %d %b'),
                        }

    def parse_index(self):
        try:
            feeds = []
            for title, href in self.find_sections():
                feeds.append((title, list(self.find_articles(href))))
            return feeds
        except:
            raise NotImplementedError

DoctorOhh · 03-03-2010, 03:21 AM

Quote:

Originally Posted by oddeyed

I have been doing test creations with the Guardian feeds on my laptop, and without fail, they include the sports section, G2 and the Entertainment which I don't want, even though I have edited the script to not get these.

So, does anyone know how to stop this from happening?

I'm guessing that after you have customized the recipe that you are still going into the English (UK) section for your recipe instead of grabbing your custom recipe from the Custom Recipe area (see attached).

The file you are making changes to can only be accessed through the Custom Recipe section. The original in the English (UK) section never changes. Notice the custom recipe does not have the little G icon next to it.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Delete news older than	Stingo	Calibre	2	12-25-2010 05:13 AM
Calibre News Fetch - Delete	strawfordt	Calibre	1	06-22-2010 02:21 PM
Auto-delete old news from device	raduma	Calibre	3	12-16-2009 08:34 PM
Delete expired news items from device	fruggeri	Calibre	2	10-12-2009 06:30 PM
"Delete news when sent" option question	Hypernova	Calibre	1	04-03-2009 11:14 PM

Advert