Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-18-2014, 11:13 AM   #1
blackberry4
Junior Member
blackberry4 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2014
Device: Kindle2
Business Week Magazine

Hello

For a few months now I have noticed the recipe for Business week Magazine is only downloading the headlines but no articles and the cover page is very dated.

Any help would be great.

Thanks very much!
blackberry4 is offline   Reply With Quote
Old 01-18-2014, 03:04 PM   #2
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,156
Karma: 1404167
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I made a quick update for this recipe. Hope, this will work for you.

Spoiler:
Code:
import re
from calibre.web.feeds.recipes import BasicNewsRecipe
from collections import OrderedDict

class BusinessWeekMagazine(BasicNewsRecipe):

    title       = 'Business Week Magazine'
    __author__  = 'Rick Shang, Armin Geller' # AGE Upd 2014-01-18

    description = 'A renowned business publication. Business news, trends and profiles of successful businesspeople.'
    language = 'en'
    category = 'news'
    encoding = 'UTF-8'
    keep_only_tags = [
            dict(name='div', attrs={'id':['content']}),         # AGE 2014-01-18
            ]
    remove_tags = [dict(name='hr'), 
                    dict(name='a', attrs={'class':'sub_sales'}),
                    dict(name='div', attrs={'class':'fieldset'}),
                    dict(name='div', attrs={'id':'taboola_wrapper'})] # AGE 2014-01-18
    no_javascript = True
    no_stylesheets = True

    cover_url             = 'http://images.businessweek.com/mz/covers/current_120x160.jpg'

    def parse_index(self):
        #Go to the issue
        soup = self.index_to_soup('http://www.businessweek.com/magazine/news/articles/business_news.htm')

        #Find date
        mag=soup.find('h2',text='Magazine')
        dates=self.tag_to_string(mag.findNext('h3'))
        self.timefmt = u' [%s]'%dates

        #Go to the main body
        div0 = soup.find('div', attrs={'class':'column left'})
        section_title = ''
        feeds = OrderedDict()
        for div in div0.findAll('a', attrs={'class': None}):
            articles = []
            section_title = self.tag_to_string(div.findPrevious('h3')).strip()
            title=self.tag_to_string(div).strip()
            url=div['href']
            soup0 = self.index_to_soup(url)
            urlprint=soup0.find('a', attrs={'href':re.compile('.*printer.*')})
            if urlprint is not None:
                url=urlprint['href']
            articles.append({'title':title, 'url':url, 'description':'', 'date':''})

            if articles:
                if section_title not in feeds:
                    feeds[section_title] = []
                feeds[section_title] += articles
        div1 = soup.find('div', attrs={'class':'column center'})
        section_title = ''
        for div in div1.findAll('a'):
            articles = []
            desc=self.tag_to_string(div.findNext('p')).strip()
            section_title = self.tag_to_string(div.findPrevious('h3')).strip()
            title=self.tag_to_string(div).strip()
            url=div['href']
            soup0 = self.index_to_soup(url)
            urlprint=soup0.find('a', attrs={'href':re.compile('.*printer.*')})
            if urlprint is not None:
                url=urlprint['href']
            articles.append({'title':title, 'url':url, 'description':desc, 'date':''})
            if articles:
                if section_title not in feeds:
                    feeds[section_title] = []
                feeds[section_title] += articles

        ans = [(key, val) for key, val in feeds.iteritems()]
        return ans
Attached Files
File Type: zip BusinessWeekMagazine_AGE.zip (1.1 KB, 100 views)
Divingduck is offline   Reply With Quote
Advert
Old 01-18-2014, 05:21 PM   #3
blackberry4
Junior Member
blackberry4 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2014
Device: Kindle2
Thank you very much! its working perfectly now
blackberry4 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Business Week Magazine rainrdx Recipes 15 09-10-2013 01:57 AM
Business Week Magazine Recipe Broken agopalak Recipes 1 09-06-2013 07:47 AM
Business Week Magazine error garyzeb55 Recipes 1 04-26-2013 08:52 PM
Business Week Magazine error garyzeb55 Recipes 1 04-05-2013 10:40 PM
Business Week problem garyzeb55 Recipes 2 03-26-2013 11:02 AM


All times are GMT -4. The time now is 07:19 AM.


MobileRead.com is a privately owned, operated and funded community.