|  08-27-2011, 07:37 PM | #1 | 
| Junior Member  Posts: 8 Karma: 10 Join Date: Mar 2011 Location: London, UK Device: Paperwhite | 
				
				The Spectator Magazine - Request/Help
			 
			
			The Spectator is a UK political magazine without RSS for the main articles. There are 7 main sections ) Politics, Essays, Wit & Wisdom, Comnists, Business, Art, Books. Each of these has a several pages with the article heading and a few sentences and a link to the main article. For exampel if you look at http://www.spectator.co.uk/essays/ you will see one page with perhaps six articles and numbers leading to further pages. The http://www.spectator.co.uk/business-and-investments/ page is similar but with a cleck here for more articles. I can see that for each of these sections need to consider as a separate feed, but having done that, I can't see how you can firstly use the parseIndex method nor can I see a way to hande multip pages otehr than hard coding. If soemone could wirte a recipe I would be grateful - even if it was only for the essays - I could then try and modify it for the other sections. Richard N in London | 
|   |   | 
|  09-04-2011, 06:22 PM | #2 | 
| Vox calibre            Posts: 412 Karma: 1175230 Join Date: Jan 2009 Device: Sony reader prs700, kobo | 
			
			I have included 3 of the sections of the website. also I used auto clean up which removes one or two pictures. you can do the clean up in detail if you wish. for the most part he auto clean up works very well. Hope this helps Code: import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class NYTimes(BasicNewsRecipe):
    title       = 'The Spectator'
    __author__  = 'Krittika Goyal'
    description = 'UK magazine'
    timefmt = ' [%d %b, %Y]'
    needs_subscription = False
    
    no_stylesheets = True
    auto_cleanup = True
    def articles_in_spec_section(self, section_url):
        articles = []
        soup = self.index_to_soup(section_url)
        div = soup.find(id='centre')
        for x in div.findAll(True):
                if x.name == 'h1':
                    # Article found
                    title = self.tag_to_string(x)
                    self.log('\tFound article:', title)
                    a = x.find('a', href=True)
                    if a is None:
                        continue
                    url = a['href']
                    if url.startswith('/'):
                        url = 'http://www.spectator.co.uk'+url
                    articles.append({'title':title, 'url':url,
                           'description':'', 'date':''})
        return articles
                    
   
    # To parse article toc
    def parse_index(self):
        sections = []
        for title, url in [
              ('Politics', 'http://www.spectator.co.uk/politics/all/'),
              ('Essays', 'http://www.spectator.co.uk/essays/'),
              ('Columnists', 'http://www.spectator.co.uk/columnists/all/'),
                   ]:
            self.log('Processing section:', title)
            articles = self.articles_in_spec_section(url)
            if articles:
                 sections.append((title,articles))
#        raise SystemExit(0)
        return sections | 
|   |   | 
| Advert | |
|  | 
|  10-12-2011, 12:44 PM | #3 | 
| Junior Member  Posts: 5 Karma: 10 Join Date: Oct 2011 Device: Kindle | 
				
				The Spectator - digital edition - paid content
			 
			
			Hi, The Spectator (UK) has a digital version that is available to subscribers. The content is different from the web news. I am a subscriber and I would like to read The Spectator on my reader. I am also a subscriber to the German magazine Der Spiegel and I download it regularly. The recipe was created by Nikolas Mangold. I am very happy with it. Der Spiegel has two recipes, just as The Spectator should have, probably: one for the digital version of the print edition (paid content) and one for the web news. Can anyone help? Thank you very much. Jan | 
|   |   | 
|  10-12-2011, 02:12 PM | #4 | 
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | 
			
			If you can't do it yourself, you will either need to find someone who is already a subscriber to do this job, or you will need to provide your subscription user/password to someone to write it.  It's very hard to write or debug if you can't access the site    | 
|   |   | 
|  10-13-2011, 04:43 AM | #5 | 
| Junior Member  Posts: 8 Karma: 10 Join Date: Mar 2011 Location: London, UK Device: Paperwhite | 
			
			I am happily using a very slightly expanded version of Krittika Goyals code, there are certain sections it does not get correctly ; and I will include them when I have debugged the problem.  Try using this which gives most of what is needed  ============================================= Code: import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class NYTimes(BasicNewsRecipe):
    title       = 'The Spectator'
    __author__  = 'Krittika Goyal'
    description = 'UK magazine'
    timefmt = ' [%d %b, %Y]'
    needs_subscription = False
    
    no_stylesheets = True
    auto_cleanup = True
    def articles_in_spec_section(self, section_url):
        articles = []
        soup = self.index_to_soup(section_url)
        div = soup.find(id='centre')
        for x in div.findAll(True):
                if x.name == 'h1':
                    # Article found
                    title = self.tag_to_string(x)
                    self.log('\tFound article:', title)
                    a = x.find('a', href=True)
                    if a is None:
                        continue
                    url = a['href']
                    if url.startswith('/'):
                        url = 'http://www.spectator.co.uk'+url
                    articles.append({'title':title, 'url':url,
                           'description':'', 'date':''})
        return articles
                    
   
    # To parse article toc
    def parse_index(self):
        sections = []
        for title, url in [
              ('Politics', 'http://www.spectator.co.uk/politics/all/'),
              ('Essays', 'http://www.spectator.co.uk/essays/'),
              ('Wit & Wisdom', 'http://www.spectator.co.uk/wit-and-wisdom/all/'),
              ('Columnists', 'http://www.spectator.co.uk/columnists/all/'),
              ('Arts', 'http://www.spectator.co.uk/arts-and-culture/featured/'),
#              ('Books', 'http://www.spectator.co.uk/books/'),
                   ]:
            self.log('Processing section:', title)
            articles = self.articles_in_spec_section(url)
            if articles:
                 sections.append((title,articles))
#        raise SystemExit(0)
        return sectionsLast edited by Starson17; 10-13-2011 at 09:08 AM. | 
|   |   | 
| Advert | |
|  | 
|  10-13-2011, 08:05 AM | #6 | 
| Grand Sorcerer            Posts: 13,685 Karma: 79983758 Join Date: Nov 2007 Location: Toronto Device: Libra H2O, Libra Colour | 
			
			Ypu might like to post that wrapped in [ code ] [ /code ] tags to preserve indentation. Remove the spaces from the tags    | 
|   |   | 
|  10-13-2011, 09:06 AM | #7 | |
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | Quote: 
  (Rather than leave it hard for others to use, I went ahead and added the code tags to his post.) Last edited by Starson17; 10-13-2011 at 09:15 AM. | |
|   |   | 
|  10-13-2011, 09:20 AM | #8 | |
| Grand Sorcerer            Posts: 13,685 Karma: 79983758 Join Date: Nov 2007 Location: Toronto Device: Libra H2O, Libra Colour | Quote: 
 I forgot the old "reply" trick! | |
|   |   | 
|  12-28-2012, 02:08 PM | #9 | 
| Zealot       Posts: 126 Karma: 570 Join Date: Nov 2008 Device: iPad 1 and iPad 4, KF HD 8.9" | 
			
			Both the recipes in the thread above does not work - using version 0.9.11.
		 Last edited by Spectrum; 01-14-2013 at 10:42 AM. | 
|   |   | 
|  01-03-2013, 02:24 PM | #10 | 
| Junior Member  Posts: 8 Karma: 10 Join Date: Mar 2011 Location: London, UK Device: Paperwhite | 
			
			I have tried to understand what is happening with the Spectator and it looked to me like there was some kind of encoding .. possibly to deter applications like to Calibre. I couldn't sort it out . | 
|   |   | 
|  01-09-2013, 03:35 AM | #11 | 
| Vox calibre            Posts: 412 Karma: 1175230 Join Date: Jan 2009 Device: Sony reader prs700, kobo | 
			
			does this now need a subscription?
		 | 
|   |   | 
|  01-09-2013, 04:07 AM | #12 | 
| Vox calibre            Posts: 412 Karma: 1175230 Join Date: Jan 2009 Device: Sony reader prs700, kobo | 
			
			see if attached file works
		 | 
|   |   | 
|  01-11-2013, 10:01 AM | #13 | 
| Zealot       Posts: 126 Karma: 570 Join Date: Nov 2008 Device: iPad 1 and iPad 4, KF HD 8.9" | 
				
				partial download
			 
			
			Strangely the recipe is downloading the page 1 links in features page but not the contents of the magazine. Tried twice with same result! Recipe calls for: return self.index_to_soup('http://www.spectator.co.uk/') but defaults to http://www.spectator.co.uk/features/ strange behavior! | 
|   |   | 
|  01-16-2013, 04:12 AM | #14 | 
| Vox calibre            Posts: 412 Karma: 1175230 Join Date: Jan 2009 Device: Sony reader prs700, kobo | 
			
			http://www.spectator.co.uk/ has 2 swctions  Coffee house on the left column and magazine on the right column. the recipe is designed to get the articles from the magazine column. When I test it that is exactly what it is doing. i am attaching a copy of the webpage as well as the epub obtained by calibre: In both: Britain’s accidental EU exit is the first article and Greening’s challenge is the last article | 
|   |   | 
|  01-17-2013, 07:36 AM | #15 | 
| Zealot       Posts: 126 Karma: 570 Join Date: Nov 2008 Device: iPad 1 and iPad 4, KF HD 8.9" | 
				
				partial download again... saga continues
			 
			
			You got the same results as I got. Just 8 articles from features section just like before - not the complete magazine. Sorry to repeat what I wrote before. Not sure why.
		 | 
|   |   | 
|  | 
| Tags | 
| recipe, request, spectator, web | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Reason Magazine request | c0llin | Recipes | 4 | 03-28-2022 01:04 PM | 
| Recipe request - Macleans Magazine | canislupus | Recipes | 7 | 07-24-2011 08:38 AM | 
| Request: Wired Magazine UK | StalkS | Recipes | 4 | 06-10-2011 03:08 PM | 
| Recipe Request for World Magazine | fbrian | Recipes | 3 | 06-05-2011 10:10 AM | 
| Help request with italian magazine | lorenzo2004 | Recipes | 1 | 05-09-2011 04:43 AM |