Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-11-2013, 05:06 AM   #1
Steven630
Zealot
Steven630 began at the beginning.
 
Posts: 129
Karma: 10
Join Date: May 2012
Device: Kindle Touch
Help needed in tweaking my recipe

I'm writing a recipe for http://www.economist.com/theworldin/2013

But I had a problem detecting all the articles because the first article of each section is different from the rest. I do know how to write two recipes that would include all the articles, but haven't figured out a way to do it in a single recipe.

Here is the recipe that fetches all the articles except the first article of each section. I'd appreciate it if someone can take a look and tweak the recipe.

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, NavigableString
from collections import OrderedDict

import re

class AdvancedUserRecipe1342144530(BasicNewsRecipe):

    title = 'The World In 2013'
    language = 'en'

    __author__ = "Kovid Goyal"
    INDEX = 'http://www.economist.com/theworldin/2013'
    description = ('Global news and current affairs from a European'
            ' perspective. Best downloaded on Friday mornings (GMT)')
    extra_css      = '''
        .headline {font-size: large;}
        '''

    keep_only_tags = [dict(name='article')]
    no_stylesheets = True

    delay = 1



    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
        feeds = []

        for section in soup.findAll('section'):
            h1 = section.find('h1')
            if h1 is None:
                continue
            section_title = self.tag_to_string(h1)
            if not section_title:
                continue
            self.log('Found section:', section_title)

        for section in soup.findAll('section'):
            h1 = section.find('h1')
            if h1 is None:
                continue
            section_title = self.tag_to_string(h1)
            if not section_title:
                continue
            self.log('Found section:', section_title)
            articles = []
            for post in section.findAll('li'):
                a = post.find(attrs={'class':'headline'})
                if a is None:
                    continue
                title = self.tag_to_string(a)
                url = a['href']
                if url.startswith('/'): url = 'http://www.economist.com'+url
                self.log('\tFound article:', title, 'at', url)
                articles.append({'title':title, 'url':url, 'description':'',
                    'date':''})
            if articles:
                feeds.append((section_title, articles))
        return feeds
Steven630 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
recipe needed blasla Recipes 0 01-17-2013 01:57 PM
iPad Recipe needed for Wordpress Blog bmwr1200c Apple Devices 4 02-11-2012 11:22 AM
Recipe Needed for Wordpress Blog. bmwr1200c Recipes 1 02-03-2012 11:31 AM
BBC Recipe conversion_options explaination needed. mattst Recipes 16 11-08-2011 12:14 PM
Recipe help needed for looping through sections of a website Acey Calibre 1 10-16-2008 01:09 PM


All times are GMT -4. The time now is 11:39 AM.


MobileRead.com is a privately owned, operated and funded community.