Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-16-2018, 05:17 PM   #1
paddyrm
Connoisseur
paddyrm began at the beginning.
 
Posts: 69
Karma: 10
Join Date: Oct 2012
Device: Kindle 3
Guardian recipe update for new site

The Guardian has revamped its website to map to the new tabloid print version. The attached recipe files (one for daily, one for Saturday) seem to work well on the Kindle. Not worked on the Observer yet.
Paddy, January 2018
Attached Files
File Type: txt Daily Guardian UK.txt (3.8 KB, 278 views)
File Type: txt Guardian Weekend.txt (4.0 KB, 233 views)
paddyrm is offline   Reply With Quote
Old 01-16-2018, 10:48 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I'm a little confused. SHouldn't there be just one recipe that changes according to whether it is a weekend or not? At least, that's the way I think the current recipe works.
kovidgoyal is offline   Reply With Quote
Advert
Old 01-17-2018, 07:30 AM   #3
paddyrm
Connoisseur
paddyrm began at the beginning.
 
Posts: 69
Karma: 10
Join Date: Oct 2012
Device: Kindle 3
"weekend" is a supplement that only appears on Saturday but can and would be downloaded every day. So my lazy way is to have a separate script! Ideally the single script would only pull down the supplement on Saturday in the same way that it goes to the Observer site on Sunday.
Any improvements you can suggest would be most welcome from this amateur!
paddyrm is offline   Reply With Quote
Old 01-17-2018, 07:38 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Something like:

Code:
if date.today().weekday() in (5, 6):
   feeds += self.parse_section('https://www.theguardian.com/theguardian/weekend', 'Weekend - ')
should do the trick
kovidgoyal is offline   Reply With Quote
Old 01-18-2018, 05:23 AM   #5
Del542
Junior Member
Del542 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2018
Device: Linx 8 Windows tablet
I have an additional although more basic question about the Guardian feeds! Since the launch of the redesigned Guardian earlier this week I have been having problems with the Guardian feeds displaying several blank pages on each article after an introductory sentence and photo.

I have a few simple Guardian feeds created in epub format for my Windows 10 tablet running Calibre. They include various themes such as 'Guardian opinion' or 'Guardian Football'. I also try the default Guardian feed on Calibre but find that these more specific feeds are quicker to create.

However the last few days in my created Guardian feeds although the articles are still appearing and loading there is the problem of many blank pages. Is there a simple setting in Calibre to help out with this?



Many thanks for any help here!
Del542 is offline   Reply With Quote
Advert
Old 01-18-2018, 05:24 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There is no simple setting you have to add the code to cleanup the downloaded html to the recipe, in advanced mode.
kovidgoyal is offline   Reply With Quote
Old 01-18-2018, 12:22 PM   #7
Del542
Junior Member
Del542 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2018
Device: Linx 8 Windows tablet
Quote:
Originally Posted by kovidgoyal View Post
There is no simple setting you have to add the code to cleanup the downloaded html to the recipe, in advanced mode.
Thanks for your quick help with this.

Have you got any examples of the sort of code which could help cleanup the download html?
Del542 is offline   Reply With Quote
Old 01-18-2018, 10:36 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://manual.calibre-ebook.com/news.html
kovidgoyal is offline   Reply With Quote
Old 02-11-2018, 04:03 PM   #9
Omniscient1
Member
Omniscient1 began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2017
Device: Kindle Paper White
I haven't looked at the submitted recipe (I will do tomorrow) but here is mine.

It autoswitches for Sunday's edition (The Observer)
Code:
#!/usr/bin/env  python2
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'

'''
www.guardian.co.uk
'''
from calibre.web.feeds.news import BasicNewsRecipe
from datetime import date


class Guardian(BasicNewsRecipe):

    title = u'The Guardian'
    if date.today().weekday() == 6:
        title = u'The Observer'
        base_url = "http://www.guardian.co.uk/theobserver"
        cover_url = 'https://i.guim.co.uk/img/media/ec57a66a548b748cd586a8b63927aad0b167b80a/0_0_642_798/master/642.jpg?w=300&q=55&auto=format&usm=12&fit=max&'
        masthead_url = 'http://static.guim.co.uk/sys-images/Guardian/Pix/site_furniture/2010/10/19/1287478087992/The-Observer-001.gif'
    else:
        base_url = "http://www.guardian.co.uk/theguardian"
#        cover_pic = 'Guardian digital edition'
#        masthead_url = 'http://static.guim.co.uk/static/f76b43f9dcfd761f0ecf7099a127b603b2922118/common/images/logos/the-guardian/titlepiece.gif'
        cover_url = 'https://i.guim.co.uk/img/media/0dcdddf037927063ea4f420e8d5baecece39d5a4/0_0_1128_1403/master/1128.png?w=700&q=55&auto=format&usm=12&fit=max&'
 #       masthead_url = 'https://assets.guim.co.uk/images/eada8aa27c12fe2d5afa3a89d3fbae0d/fallback-logo.png'
        masthead_url = 'http://www.logo-designer.co/wp-content/uploads/2018/01/2018-The-Guardian-logo-design.png'
    __author__ = 'Kovid Goyal'
    language = 'en_GB'

    oldest_article = 1
    max_articles_per_feed = 300
    remove_javascript = True
    encoding = 'utf-8'
    remove_empty_feeds = True
    no_stylesheets = True
    remove_attributes = ['style']
    ignore_duplicate_articles = {'title', 'url'}

    timefmt = ' [%a, %d %b %Y]'

    keep_only_tags = [
        dict(attrs={'class': lambda x: x and 'content__main-column' in x.split()}),
    ]
    remove_tags = [
        dict(attrs={'class': lambda x: x and '--twitter' in x}),
        dict(attrs={'class': lambda x: x and 'submeta' in x.split()}),
        dict(attrs={'data-component': ['share', 'social']}),
        dict(attrs={'data-link-name': 'block share'}),
        dict(attrs={'class': lambda x: x and 'inline-expand-image' in x}),
        dict(attrs={'class': lambda x: x and 'modern-visible' in x.split()}),
        dict(name=['link', 'meta', 'style']),
    ]
    remove_tags_after = [
        dict(attrs={'class': lambda x: x and 'content__article-body' in x.split()}),
    ]

    def preprocess_raw_html(self, raw, url):
        import html5lib
        from lxml import html
        return html.tostring(html5lib.parse(raw, namespaceHTMLElements=False, treebuilder='lxml'), encoding=unicode)

    def preprocess_html(self, soup):
        for img in soup.findAll('img', srcset=True):
            img['src'] = img['srcset'].partition(' ')[0]
            img['srcset'] = ''
        return soup

    def parse_section(self, url, title_prefix=''):
        feeds = []
        soup = self.index_to_soup(url)
        for section in soup.findAll('section'):
            title = title_prefix + self.tag_to_string(section.find(
                attrs={'class': 'fc-container__header__title'})).strip().capitalize()
            self.log('\nFound section:', title)
            feeds.append((title, []))
            for li in section.findAll('li'):
                for a in li.findAll('a', attrs={'data-link-name': 'article'}, href=True):
                    title = self.tag_to_string(a).strip()
                    url = a['href']
                    self.log(' ', title, url)
                    feeds[-1][1].append({'title': title, 'url': url})
                    break
        return feeds

    def parse_index(self):
        feeds = self.parse_section(self.base_url)
        if date.today().weekday() == 5:
            feeds += self.parse_section (
            'https://www.theguardian.com/theguardian/family', 'Family - ')
            feeds += self.parse_section (
            'https://www.theguardian.com/theguardian/guardianreview', 'Guardian Review - ')
            feeds += self.parse_section (
            'https://www.theguardian.com/theguardian/weekend', 'Weekend Magazine - ')
            feeds += self.parse_section (
            'https://www.theguardian.com/theguardian/theguide', 'The Guide - ')
        else:
          if date.today().weekday() == 6:
              feeds += self.parse_section (
               'https://www.theguardian.com/theobserver/new-review', 'New Review ')
              feeds += self.parse_section (
               'https://www.theguardian.com/theobserver/news/comment', 'Comment ')
              feeds += self.parse_section (
               'https://www.theguardian.com/theobserver/magazine', 'Observer Magazine ')
          else:
              feeds += self.parse_section (
               'https://www.theguardian.com/tone/obituaries/all', 'Obituaries - ' )
              feeds += self.parse_section (
               'https://www.theguardian.com/uk/commentisfree', 'Editorial - ' )
              feeds += self.parse_section (
               'https://www.theguardian.com/theguardian/g2', 'G2 - ' ) 

        feeds += self.parse_section(
        'https://www.theguardian.com/uk/sport', 'Sport - ')
        return feeds

Last edited by PeterT; 02-11-2018 at 04:50 PM. Reason: added [code] / [/code] wrapper
Omniscient1 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with a Guardian.co.uk recipe Andrewzy Recipes 4 12-08-2012 03:22 AM
The Guardian/The observer broken recipe ? wingmongyee Recipes 6 07-08-2011 10:38 PM
The Guardian recipe, more sections ? mrwout Recipes 0 04-11-2011 05:22 PM
Guardian Recipe has stopped working jbambridge Calibre 2 04-11-2010 01:14 PM
Guardian recipe still erratic pars_andy Calibre 17 12-24-2009 01:31 PM


All times are GMT -4. The time now is 03:08 PM.


MobileRead.com is a privately owned, operated and funded community.