08-27-2011, 07:37 PM | #1 |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2011
Location: London, UK
Device: Paperwhite
|
The Spectator Magazine - Request/Help
The Spectator is a UK political magazine without RSS for the main articles.
There are 7 main sections ) Politics, Essays, Wit & Wisdom, Comnists, Business, Art, Books. Each of these has a several pages with the article heading and a few sentences and a link to the main article. For exampel if you look at http://www.spectator.co.uk/essays/ you will see one page with perhaps six articles and numbers leading to further pages. The http://www.spectator.co.uk/business-and-investments/ page is similar but with a cleck here for more articles. I can see that for each of these sections need to consider as a separate feed, but having done that, I can't see how you can firstly use the parseIndex method nor can I see a way to hande multip pages otehr than hard coding. If soemone could wirte a recipe I would be grateful - even if it was only for the essays - I could then try and modify it for the other sections. Richard N in London |
09-04-2011, 06:22 PM | #2 |
Vox calibre
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
I have included 3 of the sections of the website. also I used auto clean up which removes one or two pictures. you can do the clean up in detail if you wish. for the most part he auto clean up works very well.
Hope this helps Code:
import string, re from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup class NYTimes(BasicNewsRecipe): title = 'The Spectator' __author__ = 'Krittika Goyal' description = 'UK magazine' timefmt = ' [%d %b, %Y]' needs_subscription = False no_stylesheets = True auto_cleanup = True def articles_in_spec_section(self, section_url): articles = [] soup = self.index_to_soup(section_url) div = soup.find(id='centre') for x in div.findAll(True): if x.name == 'h1': # Article found title = self.tag_to_string(x) self.log('\tFound article:', title) a = x.find('a', href=True) if a is None: continue url = a['href'] if url.startswith('/'): url = 'http://www.spectator.co.uk'+url articles.append({'title':title, 'url':url, 'description':'', 'date':''}) return articles # To parse article toc def parse_index(self): sections = [] for title, url in [ ('Politics', 'http://www.spectator.co.uk/politics/all/'), ('Essays', 'http://www.spectator.co.uk/essays/'), ('Columnists', 'http://www.spectator.co.uk/columnists/all/'), ]: self.log('Processing section:', title) articles = self.articles_in_spec_section(url) if articles: sections.append((title,articles)) # raise SystemExit(0) return sections |
Advert | |
|
10-12-2011, 12:44 PM | #3 |
Junior Member
Posts: 5
Karma: 10
Join Date: Oct 2011
Device: Kindle
|
The Spectator - digital edition - paid content
Hi,
The Spectator (UK) has a digital version that is available to subscribers. The content is different from the web news. I am a subscriber and I would like to read The Spectator on my reader. I am also a subscriber to the German magazine Der Spiegel and I download it regularly. The recipe was created by Nikolas Mangold. I am very happy with it. Der Spiegel has two recipes, just as The Spectator should have, probably: one for the digital version of the print edition (paid content) and one for the web news. Can anyone help? Thank you very much. Jan |
10-12-2011, 02:12 PM | #4 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
If you can't do it yourself, you will either need to find someone who is already a subscriber to do this job, or you will need to provide your subscription user/password to someone to write it. It's very hard to write or debug if you can't access the site
|
10-13-2011, 04:43 AM | #5 |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2011
Location: London, UK
Device: Paperwhite
|
I am happily using a very slightly expanded version of Krittika Goyals code, there are certain sections it does not get correctly ; and I will include them when I have debugged the problem. Try using this which gives most of what is needed
============================================= Code:
import string, re from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup class NYTimes(BasicNewsRecipe): title = 'The Spectator' __author__ = 'Krittika Goyal' description = 'UK magazine' timefmt = ' [%d %b, %Y]' needs_subscription = False no_stylesheets = True auto_cleanup = True def articles_in_spec_section(self, section_url): articles = [] soup = self.index_to_soup(section_url) div = soup.find(id='centre') for x in div.findAll(True): if x.name == 'h1': # Article found title = self.tag_to_string(x) self.log('\tFound article:', title) a = x.find('a', href=True) if a is None: continue url = a['href'] if url.startswith('/'): url = 'http://www.spectator.co.uk'+url articles.append({'title':title, 'url':url, 'description':'', 'date':''}) return articles # To parse article toc def parse_index(self): sections = [] for title, url in [ ('Politics', 'http://www.spectator.co.uk/politics/all/'), ('Essays', 'http://www.spectator.co.uk/essays/'), ('Wit & Wisdom', 'http://www.spectator.co.uk/wit-and-wisdom/all/'), ('Columnists', 'http://www.spectator.co.uk/columnists/all/'), ('Arts', 'http://www.spectator.co.uk/arts-and-culture/featured/'), # ('Books', 'http://www.spectator.co.uk/books/'), ]: self.log('Processing section:', title) articles = self.articles_in_spec_section(url) if articles: sections.append((title,articles)) # raise SystemExit(0) return sections Last edited by Starson17; 10-13-2011 at 09:08 AM. |
Advert | |
|
10-13-2011, 08:05 AM | #6 |
Grand Sorcerer
Posts: 12,154
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
Ypu might like to post that wrapped in [ code ] [ /code ] tags to preserve indentation. Remove the spaces from the tags
|
10-13-2011, 09:06 AM | #7 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
(Rather than leave it hard for others to use, I went ahead and added the code tags to his post.) Last edited by Starson17; 10-13-2011 at 09:15 AM. |
|
10-13-2011, 09:20 AM | #8 | |
Grand Sorcerer
Posts: 12,154
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
Quote:
I forgot the old "reply" trick! |
|
12-28-2012, 02:08 PM | #9 |
Zealot
Posts: 126
Karma: 570
Join Date: Nov 2008
Device: iPad 1 and iPad 4, KF HD 8.9"
|
Both the recipes in the thread above does not work - using version 0.9.11.
Last edited by Spectrum; 01-14-2013 at 10:42 AM. |
01-03-2013, 02:24 PM | #10 |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2011
Location: London, UK
Device: Paperwhite
|
I have tried to understand what is happening with the Spectator and it looked to me like there was some kind of encoding .. possibly to deter applications like to Calibre.
I couldn't sort it out . |
01-09-2013, 03:35 AM | #11 |
Vox calibre
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
does this now need a subscription?
|
01-09-2013, 04:07 AM | #12 |
Vox calibre
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
see if attached file works
|
01-11-2013, 10:01 AM | #13 |
Zealot
Posts: 126
Karma: 570
Join Date: Nov 2008
Device: iPad 1 and iPad 4, KF HD 8.9"
|
partial download
Strangely the recipe is downloading the page 1 links in features page but not the contents of the magazine. Tried twice with same result!
Recipe calls for: return self.index_to_soup('http://www.spectator.co.uk/') but defaults to http://www.spectator.co.uk/features/ strange behavior! |
01-16-2013, 04:12 AM | #14 |
Vox calibre
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
http://www.spectator.co.uk/ has 2 swctions
Coffee house on the left column and magazine on the right column. the recipe is designed to get the articles from the magazine column. When I test it that is exactly what it is doing. i am attaching a copy of the webpage as well as the epub obtained by calibre: In both: Britain’s accidental EU exit is the first article and Greening’s challenge is the last article |
01-17-2013, 07:36 AM | #15 |
Zealot
Posts: 126
Karma: 570
Join Date: Nov 2008
Device: iPad 1 and iPad 4, KF HD 8.9"
|
partial download again... saga continues
You got the same results as I got. Just 8 articles from features section just like before - not the complete magazine. Sorry to repeat what I wrote before. Not sure why.
|
Tags |
recipe, request, spectator, web |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Reason Magazine request | c0llin | Recipes | 4 | 03-28-2022 01:04 PM |
Recipe request - Macleans Magazine | canislupus | Recipes | 7 | 07-24-2011 08:38 AM |
Request: Wired Magazine UK | StalkS | Recipes | 4 | 06-10-2011 03:08 PM |
Recipe Request for World Magazine | fbrian | Recipes | 3 | 06-05-2011 10:10 AM |
Help request with italian magazine | lorenzo2004 | Recipes | 1 | 05-09-2011 04:43 AM |