The Spectator Magazine - Request/Help

RichardN · 08-27-2011, 07:37 PM

The Spectator is a UK political magazine without RSS for the main articles.

There are 7 main sections ) Politics, Essays, Wit & Wisdom, Comnists, Business, Art, Books.

Each of these has a several pages with the article heading and a few sentences and a link to the main article.

For exampel if you look at http://www.spectator.co.uk/essays/ you will see one page with perhaps six articles and numbers leading to further pages.

The http://www.spectator.co.uk/business-and-investments/ page is similar but with a cleck here for more articles.

I can see that for each of these sections need to consider as a separate feed, but having done that, I can't see how you can firstly use the parseIndex method nor can I see a way to hande multip pages otehr than hard coding.

If soemone could wirte a recipe I would be grateful - even if it was only for the essays - I could then try and modify it for the other sections.

Richard N in London

Krittika Goyal · 09-04-2011, 06:22 PM

I have included 3 of the sections of the website. also I used auto clean up which removes one or two pictures. you can do the clean up in detail if you wish. for the most part he auto clean up works very well.

Hope this helps

Code:

import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class NYTimes(BasicNewsRecipe):

    title       = 'The Spectator'
    __author__  = 'Krittika Goyal'
    description = 'UK magazine'
    timefmt = ' [%d %b, %Y]'
    needs_subscription = False
    
    no_stylesheets = True
    auto_cleanup = True


    def articles_in_spec_section(self, section_url):
        articles = []
        soup = self.index_to_soup(section_url)
        div = soup.find(id='centre')
        for x in div.findAll(True):
                if x.name == 'h1':
                    # Article found
                    title = self.tag_to_string(x)
                    self.log('\tFound article:', title)
                    a = x.find('a', href=True)
                    if a is None:
                        continue
                    url = a['href']
                    if url.startswith('/'):
                        url = 'http://www.spectator.co.uk'+url
                    articles.append({'title':title, 'url':url,
                           'description':'', 'date':''})
        return articles
                    
   
    # To parse article toc
    def parse_index(self):
        sections = []
        for title, url in [
              ('Politics', 'http://www.spectator.co.uk/politics/all/'),
              ('Essays', 'http://www.spectator.co.uk/essays/'),
              ('Columnists', 'http://www.spectator.co.uk/columnists/all/'),
                   ]:
            self.log('Processing section:', title)
            articles = self.articles_in_spec_section(url)
            if articles:
                 sections.append((title,articles))
#        raise SystemExit(0)
        return sections

JanMB · 10-12-2011, 12:44 PM

Hi,

The Spectator (UK) has a digital version that is available to subscribers. The content is different from the web news. I am a subscriber and I would like to read The Spectator on my reader.

I am also a subscriber to the German magazine Der Spiegel and I download it regularly. The recipe was created by Nikolas Mangold. I am very happy with it. Der Spiegel has two recipes, just as The Spectator should have, probably: one for the digital version of the print edition (paid content) and one for the web news.

Can anyone help?
Thank you very much.
Jan

Starson17 · 10-12-2011, 02:12 PM

Quote:

Originally Posted by JanMB

Can anyone help?

If you can't do it yourself, you will either need to find someone who is already a subscriber to do this job, or you will need to provide your subscription user/password to someone to write it. It's very hard to write or debug if you can't access the site

RichardN · 10-13-2011, 04:43 AM

I am happily using a very slightly expanded version of Krittika Goyals code, there are certain sections it does not get correctly ; and I will include them when I have debugged the problem. Try using this which gives most of what is needed

=============================================

Code:

import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class NYTimes(BasicNewsRecipe):

    title       = 'The Spectator'
    __author__  = 'Krittika Goyal'
    description = 'UK magazine'
    timefmt = ' [%d %b, %Y]'
    needs_subscription = False
    
    no_stylesheets = True
    auto_cleanup = True


    def articles_in_spec_section(self, section_url):
        articles = []
        soup = self.index_to_soup(section_url)
        div = soup.find(id='centre')
        for x in div.findAll(True):
                if x.name == 'h1':
                    # Article found
                    title = self.tag_to_string(x)
                    self.log('\tFound article:', title)
                    a = x.find('a', href=True)
                    if a is None:
                        continue
                    url = a['href']
                    if url.startswith('/'):
                        url = 'http://www.spectator.co.uk'+url
                    articles.append({'title':title, 'url':url,
                           'description':'', 'date':''})
        return articles
                    
   
    # To parse article toc
    def parse_index(self):
        sections = []
        for title, url in [
              ('Politics', 'http://www.spectator.co.uk/politics/all/'),
              ('Essays', 'http://www.spectator.co.uk/essays/'),
              ('Wit & Wisdom', 'http://www.spectator.co.uk/wit-and-wisdom/all/'),
              ('Columnists', 'http://www.spectator.co.uk/columnists/all/'),
              ('Arts', 'http://www.spectator.co.uk/arts-and-culture/featured/'),
#              ('Books', 'http://www.spectator.co.uk/books/'),
                   ]:
            self.log('Processing section:', title)
            articles = self.articles_in_spec_section(url)
            if articles:
                 sections.append((title,articles))
#        raise SystemExit(0)
        return sections

==========================================

PeterT · 10-13-2011, 08:05 AM

Ypu might like to post that wrapped in [ code ] [ /code ] tags to preserve indentation. Remove the spaces from the tags

Starson17 · 10-13-2011, 09:06 AM

Quote:

Originally Posted by PeterT

Ypu might like to post that wrapped in [ code ] [ /code ] tags to preserve indentation. Remove the spaces from the tags

I agree with your suggestion, but it's handy to know that the indents are actually preserved, just not displayed. If you need them, try quoting his post, as if replying, and they will appear. Copy from the text in the quote on the reply page, use that, then cancel the reply.

(Rather than leave it hard for others to use, I went ahead and added the code tags to his post.)

PeterT · 10-13-2011, 09:20 AM

Quote:

Originally Posted by Starson17

I agree with your suggestion, but it's handy to know that the indents are actually preserved, just not displayed. If you need them, try quoting his post, as if replying, and they will appear. Copy from the text in the quote on the reply page, use that, then cancel the reply.

(Rather than leave it hard for others to use, I went ahead and added the code tags to his post.)

DUH..

I forgot the old "reply" trick!

Spectrum · 12-28-2012, 02:08 PM

Both the recipes in the thread above does not work - using version 0.9.11.

RichardN · 01-03-2013, 02:24 PM

I have tried to understand what is happening with the Spectator and it looked to me like there was some kind of encoding .. possibly to deter applications like to Calibre.
I couldn't sort it out .

Krittika Goyal · 01-09-2013, 03:35 AM

does this now need a subscription?

Krittika Goyal · 01-09-2013, 04:07 AM

see if attached file works

Spectrum · 01-11-2013, 10:01 AM

Strangely the recipe is downloading the page 1 links in features page but not the contents of the magazine. Tried twice with same result!
Recipe calls for:

return self.index_to_soup('http://www.spectator.co.uk/')

but defaults to

http://www.spectator.co.uk/features/

strange behavior!

Krittika Goyal · 01-16-2013, 04:12 AM

http://www.spectator.co.uk/ has 2 swctions
Coffee house on the left column and magazine on the right column.
the recipe is designed to get the articles from the magazine column.
When I test it that is exactly what it is doing.

i am attaching a copy of the webpage as well as the epub obtained by calibre:

In both:
Britain’s accidental EU exit is the first article and
Greening’s challenge is the last article

Spectrum · 01-17-2013, 07:36 AM

You got the same results as I got. Just 8 articles from features section just like before - not the complete magazine. Sorry to repeat what I wrote before. Not sure why.

08-27-2011, 07:37 PM	#1
RichardN Junior Member Posts: 8 Karma: 10 Join Date: Mar 2011 Location: London, UK Device: Paperwhite	The Spectator Magazine - Request/Help The Spectator is a UK political magazine without RSS for the main articles. There are 7 main sections ) Politics, Essays, Wit & Wisdom, Comnists, Business, Art, Books. Each of these has a several pages with the article heading and a few sentences and a link to the main article. For exampel if you look at http://www.spectator.co.uk/essays/ you will see one page with perhaps six articles and numbers leading to further pages. The http://www.spectator.co.uk/business-and-investments/ page is similar but with a cleck here for more articles. I can see that for each of these sections need to consider as a separate feed, but having done that, I can't see how you can firstly use the parseIndex method nor can I see a way to hande multip pages otehr than hard coding. If soemone could wirte a recipe I would be grateful - even if it was only for the essays - I could then try and modify it for the other sections. Richard N in London

10-12-2011, 12:44 PM	#3
JanMB Junior Member Posts: 5 Karma: 10 Join Date: Oct 2011 Device: Kindle	The Spectator - digital edition - paid content Hi, The Spectator (UK) has a digital version that is available to subscribers. The content is different from the web news. I am a subscriber and I would like to read The Spectator on my reader. I am also a subscriber to the German magazine Der Spiegel and I download it regularly. The recipe was created by Nikolas Mangold. I am very happy with it. Der Spiegel has two recipes, just as The Spectator should have, probably: one for the digital version of the print edition (paid content) and one for the web news. Can anyone help? Thank you very much. Jan

12-28-2012, 02:08 PM	#9
Spectrum Zealot Posts: 126 Karma: 570 Join Date: Nov 2008 Device: iPad 1 and iPad 4, KF HD 8.9"	Both the recipes in the thread above does not work - using version 0.9.11. Last edited by Spectrum; 01-14-2013 at 10:42 AM.

01-11-2013, 10:01 AM	#13
Spectrum Zealot Posts: 126 Karma: 570 Join Date: Nov 2008 Device: iPad 1 and iPad 4, KF HD 8.9"	partial download Strangely the recipe is downloading the page 1 links in features page but not the contents of the magazine. Tried twice with same result! Recipe calls for: return self.index_to_soup('http://www.spectator.co.uk/') but defaults to http://www.spectator.co.uk/features/ strange behavior!

01-17-2013, 07:36 AM	#15
Spectrum Zealot Posts: 126 Karma: 570 Join Date: Nov 2008 Device: iPad 1 and iPad 4, KF HD 8.9"	partial download again... saga continues You got the same results as I got. Just 8 articles from features section just like before - not the complete magazine. Sorry to repeat what I wrote before. Not sure why.

10-13-2011, 08:05 AM	#6
PeterT Grand Sorcerer Posts: 12,154 Karma: 73448616 Join Date: Nov 2007 Location: Toronto Device: Nexus 7, Clara, Touch, Tolino EPOS	Ypu might like to post that wrapped in [ code ] [ /code ] tags to preserve indentation. Remove the spaces from the tags

01-03-2013, 02:24 PM	#10
RichardN Junior Member Posts: 8 Karma: 10 Join Date: Mar 2011 Location: London, UK Device: Paperwhite	I have tried to understand what is happening with the Spectator and it looked to me like there was some kind of encoding .. possibly to deter applications like to Calibre. I couldn't sort it out .

01-09-2013, 03:35 AM	#11
Krittika Goyal Vox calibre Posts: 412 Karma: 1175230 Join Date: Jan 2009 Device: Sony reader prs700, kobo	does this now need a subscription?

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Reason Magazine request	c0llin	Recipes	4	03-28-2022 01:04 PM
Recipe request - Macleans Magazine	canislupus	Recipes	7	07-24-2011 08:38 AM
Request: Wired Magazine UK	StalkS	Recipes	4	06-10-2011 03:08 PM
Recipe Request for World Magazine	fbrian	Recipes	3	06-05-2011 10:10 AM
Help request with italian magazine	lorenzo2004	Recipes	1	05-09-2011 04:43 AM

Advert

Advert