|
|
#1 |
|
Groupie
![]() Posts: 180
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Help needed in tweaking my recipe
I'm writing a recipe for http://www.economist.com/theworldin/2013
But I had a problem detecting all the articles because the first article of each section is different from the rest. I do know how to write two recipes that would include all the articles, but haven't figured out a way to do it in a single recipe. Here is the recipe that fetches all the articles except the first article of each section. I'd appreciate it if someone can take a look and tweak the recipe. Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, NavigableString
from collections import OrderedDict
import re
class AdvancedUserRecipe1342144530(BasicNewsRecipe):
title = 'The World In 2013'
language = 'en'
__author__ = "Kovid Goyal"
INDEX = 'http://www.economist.com/theworldin/2013'
description = ('Global news and current affairs from a European'
' perspective. Best downloaded on Friday mornings (GMT)')
extra_css = '''
.headline {font-size: large;}
'''
keep_only_tags = [dict(name='article')]
no_stylesheets = True
delay = 1
def parse_index(self):
articles = []
soup = self.index_to_soup(self.INDEX)
feeds = []
for section in soup.findAll('section'):
h1 = section.find('h1')
if h1 is None:
continue
section_title = self.tag_to_string(h1)
if not section_title:
continue
self.log('Found section:', section_title)
for section in soup.findAll('section'):
h1 = section.find('h1')
if h1 is None:
continue
section_title = self.tag_to_string(h1)
if not section_title:
continue
self.log('Found section:', section_title)
articles = []
for post in section.findAll('li'):
a = post.find(attrs={'class':'headline'})
if a is None:
continue
title = self.tag_to_string(a)
url = a['href']
if url.startswith('/'): url = 'http://www.economist.com'+url
self.log('\tFound article:', title, 'at', url)
articles.append({'title':title, 'url':url, 'description':'',
'date':''})
if articles:
feeds.append((section_title, articles))
return feeds
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| recipe needed | blasla | Recipes | 0 | 01-17-2013 02:57 PM |
| iPad Recipe needed for Wordpress Blog | bmwr1200c | Apple Devices | 4 | 02-11-2012 12:22 PM |
| Recipe Needed for Wordpress Blog. | bmwr1200c | Recipes | 1 | 02-03-2012 12:31 PM |
| BBC Recipe conversion_options explaination needed. | mattst | Recipes | 16 | 11-08-2011 01:14 PM |
| Recipe help needed for looping through sections of a website | Acey | Calibre | 1 | 10-16-2008 02:09 PM |