Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-24-2014, 08:02 PM   #1
ireadtheinternet
Member
ireadtheinternet began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Oct 2014
Device: Android
I fixed the Friday Times recipe

I had been using the Friday Times recipe as a template, because it was about the simplest parse_index recipe (that is, a recipe not based on a RSS feed) I could find. However I eventually noticed the recipe itself was broken, and I had to take a break from the other recipe I am working on. So I fixed the Friday Times recipe. Let me know any criticisms or suggestions for the fix.

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class TheFridayTimes(BasicNewsRecipe):
    language       = 'en_PK'
    encoding       = 'utf8'
    version        = 1.1
    
    title          = u'The Friday Times'
    category       = u'news, Pakistan'
    description    = u"Pakistan's First Independent Weekly Paper"

    no_stylesheets            = True
    no_javascript             = True
    ignore_duplicate_articles = {'url'}

    keep_only_tags = [
        dict(name='div', attrs={'class':'sidebar_content'}),
        dict(name='div', attrs={'class':'comment_inner'})  
    ]

    remove_tags = [
        dict(name='p', attrs={'class':'no-break'}),
        dict(name='div', attrs={'class':'related_posts'}),
        dict(name='div', attrs={'id':'respond'})
    ]

    def parse_index(self):      
        toc_page = self.index_to_soup('http://www.thefridaytimes.com/tft/') 
        toc = toc_page.find('div', attrs={'class':'sidebar_left_home_wrapper'})

        articles = []
        for story in toc.findAll('a'):
            # skip the links with an image, they are repeated further down
            if story.find('img') is not None:
                continue
            url = story['href']
            # If no title, use url as title
            title = story.get('title', url)
            self.log('Found article:', story)
            self.log('\t', url)
            articles.append({'title':title, 'url':url, 'date':'','description':''})

        return [('Current Issue', articles)]

Last edited by ireadtheinternet; 11-25-2014 at 06:04 AM.
ireadtheinternet is offline   Reply With Quote
Old 11-28-2014, 05:33 PM   #2
ireadtheinternet
Member
ireadtheinternet began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Oct 2014
Device: Android
If the [s]articles[/s] comments are not wanted, you can comment out the line

Code:
dict(name='div', attrs={'class':'comment_inner'})
The builtin currently has an issue where the comments were displaying but they were in a giant font, so I wasn't sure if the original intent had been to include or exclude the comments.

EDIT: Noticed this is an official edit now, thanks.

Last edited by ireadtheinternet; 12-23-2014 at 11:46 PM.
ireadtheinternet is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Newsweek Polska - fixed recipe admroz Recipes 1 10-16-2013 02:14 PM
Fixed brand eins recipe siebert Recipes 18 07-30-2013 06:56 AM
Help with Recipe for The Friday Times multani Recipes 0 03-11-2013 03:26 PM
Fixed Sydney Morning Herald Recipe zephram Recipes 0 09-29-2011 08:51 AM
[fixed recipe] Wprost - polish newsmagazine zaslav Recipes 0 06-26-2011 04:53 PM


All times are GMT -4. The time now is 04:16 PM.


MobileRead.com is a privately owned, operated and funded community.