Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes


Thread Tools Search this Thread
Old 03-05-2012, 08:41 PM   #1
besianm began at the beginning.
Posts: 11
Karma: 10
Join Date: Mar 2012
Device: kindle touch
Problem: Recipe for Foreign Affairs not fetching premium articles


The built-in recipe for Foreign Affairs does not fetch premium articles. I have an online subscription with Foreign Affairs. I'm pasting the recipe code below so maybe you can help me tweak it so I can fetch premium articles:

from import BasicNewsRecipe
import re
from calibre.ptempfile import PersistentTemporaryFile

class ForeignAffairsRecipe(BasicNewsRecipe):
    ''' there are three modifications:
    1) fetch issue cover
    2) toggle ignore premium articles
    3) extract proper section names, ie. "Comments", "Essay"

    by Chen Wei, 2012-02-05'''

    __license__  = 'GPL v3'
    __author__ = 'kwetal'
    language = 'en'
    version = 1.01

    title = u'Foreign Affairs (Subcription or (free) Registration)'
    publisher = u'Council on Foreign Relations'
    category = u'USA, Foreign Affairs'
    description = u'The leading forum for serious discussion of American foreign policy and international affairs.'

    no_stylesheets = True
    remove_javascript = True

    INDEX = ''
    FRONTPAGE = ''

    remove_tags = []
    remove_tags.append(dict(name = 'base'))
    #remove_tags.append(dict(name = '', attrs = {'': ''}))

    remove_tags_before = dict(name = 'h1', attrs = {'class': 'print-title'})

    remove_tags_after = dict(name = 'div', attrs = {'class': 'print-footer'})

    extra_css = '''
                div.print-footer {font-size: x-small; color: #696969;}

    conversion_options = {'comments': description, 'tags': category, 'language': 'en',
                          'publisher': publisher}

    temp_files = []
    articles_are_obfuscated = True

    def get_cover_url(self):
        soup = self.index_to_soup(self.FRONTPAGE)
        div = soup.find('div', attrs={'class':'inthemag-issuebuy-cover'})
        img_url =  div.find('img')['src']
        return self.INDEX + img_url

    def get_obfuscated_article(self, url):
        br = self.get_browser()

        response = br.follow_link(url_regex = r'/print/[0-9]+', nr = 0)
        html =


        return self.temp_files[-1].name

    def parse_index(self):
        answer = []
        soup = self.index_to_soup(self.FRONTPAGE)
        sec_start = soup.findAll('div', attrs={'class':'panel-separator'})
        for sec in sec_start:
            content = sec.nextSibling
            if content:
                section = self.tag_to_string(content.find('h2'))
                articles = []

                tags = []
                for div in content.findAll('div', attrs = {'class': re.compile(r'view-row\s+views-row-[0-9]+\s+views-row-[odd|even].*')}):
                for li in content.findAll('li'):

                for div in tags:
                    title = url = description = author = None

                    if self.INCLUDE_PREMIUM:
                        found_premium = False
                        found_premium = div.findAll('span', attrs={'class':
                    if not found_premium:
                        tag = div.find('div', attrs={'class': 'views-field-title'})

                        if tag:
                            a = tag.find('a')
                            if a:
                                title = self.tag_to_string(a)
                                url = self.INDEX + a['href']
                            author = self.tag_to_string(div.find('div', attrs = {'class': 'views-field-field-article-display-authors-value'}))
                            tag_summary = div.find('span', attrs = {'class': 'views-field-field-article-summary-value'})
                            description = self.tag_to_string(tag_summary)
                            articles.append({'title':title, 'date':None, 'url':url,
                                     'description':description, 'author':author})
                if articles:
                    answer.append((section, articles))
        return answer

    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs = {'src': True}):
            if not img['src'].startswith('http://'):
                img['src'] = self.INDEX + img['src']

        return soup

    needs_subscription = True

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.select_form(nr = 1)
            br['name']   = self.username
            br['pass'] = self.password
        return br
besianm is offline   Reply With Quote
Old 03-07-2012, 04:41 AM   #2
Divingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolate
Posts: 518
Karma: 33884
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I do not use this recipe but there is a switch:
May be you need to change it to "INCLUDE_PREMIUM = True" ?

Edit: just seen you had done a second post and it is solved.

Last edited by Divingduck; 03-07-2012 at 04:46 AM.
Divingduck is offline   Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
minor modified Foreign Affairs receipe forceps Recipes 3 03-06-2012 10:43 PM
Updated OReilly Premium Recipe TechnoCat Recipes 0 01-15-2012 12:54 PM
New Recipe: OReilly Premium TechnoCat Recipes 0 01-07-2012 12:43 PM
Foreign Affairs subscription - I don't understand the pricing. adriatikfan Amazon Kindle 9 11-08-2009 12:05 AM
Foreign Affairs Replacing Previous Issue Spankypoo Amazon Kindle 6 07-08-2009 12:31 PM

All times are GMT -4. The time now is 03:02 AM. is a privately owned, operated and funded community.