Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-05-2012, 08:41 PM   #1
besianm
Member
besianm began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Mar 2012
Device: kindle touch
Problem: Recipe for Foreign Affairs not fetching premium articles

Hi,

The built-in recipe for Foreign Affairs does not fetch premium articles. I have an online subscription with Foreign Affairs. I'm pasting the recipe code below so maybe you can help me tweak it so I can fetch premium articles:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
import re
from calibre.ptempfile import PersistentTemporaryFile

class ForeignAffairsRecipe(BasicNewsRecipe):
    ''' there are three modifications:
    1) fetch issue cover
    2) toggle ignore premium articles
    3) extract proper section names, ie. "Comments", "Essay"

    by Chen Wei weichen302@gmx.com, 2012-02-05'''

    __license__  = 'GPL v3'
    __author__ = 'kwetal'
    language = 'en'
    version = 1.01

    title = u'Foreign Affairs (Subcription or (free) Registration)'
    publisher = u'Council on Foreign Relations'
    category = u'USA, Foreign Affairs'
    description = u'The leading forum for serious discussion of American foreign policy and international affairs.'

    no_stylesheets = True
    remove_javascript = True

    INDEX = 'http://www.foreignaffairs.com'
    FRONTPAGE = 'http://www.foreignaffairs.com/magazine'
    INCLUDE_PREMIUM = False


    remove_tags = []
    remove_tags.append(dict(name = 'base'))
    #remove_tags.append(dict(name = '', attrs = {'': ''}))

    remove_tags_before = dict(name = 'h1', attrs = {'class': 'print-title'})

    remove_tags_after = dict(name = 'div', attrs = {'class': 'print-footer'})

    extra_css = '''
                body{font-family:verdana,arial,helvetica,geneva,sans-serif;}
                div.print-footer {font-size: x-small; color: #696969;}
                '''

    conversion_options = {'comments': description, 'tags': category, 'language': 'en',
                          'publisher': publisher}

    temp_files = []
    articles_are_obfuscated = True

    def get_cover_url(self):
        soup = self.index_to_soup(self.FRONTPAGE)
        div = soup.find('div', attrs={'class':'inthemag-issuebuy-cover'})
        img_url =  div.find('img')['src']
        return self.INDEX + img_url

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        br.open(url)

        response = br.follow_link(url_regex = r'/print/[0-9]+', nr = 0)
        html = response.read()

        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()

        return self.temp_files[-1].name


    def parse_index(self):
        answer = []
        soup = self.index_to_soup(self.FRONTPAGE)
        sec_start = soup.findAll('div', attrs={'class':'panel-separator'})
        for sec in sec_start:
            content = sec.nextSibling
            if content:
                section = self.tag_to_string(content.find('h2'))
                articles = []

                tags = []
                for div in content.findAll('div', attrs = {'class': re.compile(r'view-row\s+views-row-[0-9]+\s+views-row-[odd|even].*')}):
                    tags.append(div)
                for li in content.findAll('li'):
                    tags.append(li)

                for div in tags:
                    title = url = description = author = None

                    if self.INCLUDE_PREMIUM:
                        found_premium = False
                    else:
                        found_premium = div.findAll('span', attrs={'class':
                                                               'premium-icon'})
                    if not found_premium:
                        tag = div.find('div', attrs={'class': 'views-field-title'})

                        if tag:
                            a = tag.find('a')
                            if a:
                                title = self.tag_to_string(a)
                                url = self.INDEX + a['href']
                            author = self.tag_to_string(div.find('div', attrs = {'class': 'views-field-field-article-display-authors-value'}))
                            tag_summary = div.find('span', attrs = {'class': 'views-field-field-article-summary-value'})
                            description = self.tag_to_string(tag_summary)
                            articles.append({'title':title, 'date':None, 'url':url,
                                     'description':description, 'author':author})
                if articles:
                    answer.append((section, articles))
        return answer

    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs = {'src': True}):
            if not img['src'].startswith('http://'):
                img['src'] = self.INDEX + img['src']

        return soup

    needs_subscription = True

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open('https://www.foreignaffairs.com/user?destination=home')
            br.select_form(nr = 1)
            br['name']   = self.username
            br['pass'] = self.password
            br.submit()
        return br
besianm is offline   Reply With Quote
Old 03-07-2012, 04:41 AM   #2
Divingduck
Evangelist
Divingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolateDivingduck is generous with chocolate
 
Posts: 441
Karma: 33884
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I do not use this recipe but there is a switch:
... INCLUDE_PREMIUM = False
May be you need to change it to "INCLUDE_PREMIUM = True" ?

Edit: just seen you had done a second post and it is solved.

Last edited by Divingduck; 03-07-2012 at 04:46 AM.
Divingduck is offline   Reply With Quote
 
Enthusiast
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
minor modified Foreign Affairs receipe forceps Recipes 3 03-06-2012 10:43 PM
Updated OReilly Premium Recipe TechnoCat Recipes 0 01-15-2012 12:54 PM
New Recipe: OReilly Premium TechnoCat Recipes 0 01-07-2012 12:43 PM
Foreign Affairs subscription - I don't understand the pricing. adriatikfan Amazon Kindle 9 11-08-2009 12:05 AM
Foreign Affairs Replacing Previous Issue Spankypoo Amazon Kindle 6 07-08-2009 12:31 PM


All times are GMT -4. The time now is 07:21 AM.


MobileRead.com is a privately owned, operated and funded community.