Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-16-2010, 08:51 PM   #1
PipSqueak
Junior Member
PipSqueak began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Oct 2010
Device: Kindle
Request for recipes of sites with no rss

http://www.malaysianmirror.com/index.html (English)
http://www.eunited.com.my/ (Mandarin)
http://www.sinchew.com.my/ (Mandarin)

Can anyone help make recipes for the above sites? Really appreciate it.

Last edited by PipSqueak; 10-16-2010 at 08:57 PM.
PipSqueak is offline   Reply With Quote
Old 10-16-2010, 10:05 PM   #2
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by PipSqueak View Post
http://www.malaysianmirror.com/index.html (English)
http://www.eunited.com.my/ (Mandarin)
http://www.sinchew.com.my/ (Mandarin)

Can anyone help make recipes for the above sites? Really appreciate it.
How about this since you requested "help". Take and read up the conversations I had with Starson17 about Field and Streams. And also look up make_links and parse_index. Once you do that then write some code using it. If it doesn't work then take and post it in here using spoiler (the eye with the x) and code(the # icon) tags and then when I get time I will help or i'm sure someone else will help as well. the only way your gonna learn is by doing it. I will do one for you and you can follow my lead and do the rest....
Spoiler:
Code:
#!/usr/bin/env  python
__license__   = 'GPL v3'
__author__    = 'Tony Stegall' 
__copyright__ = '2010, Tony Stegall or Tonythebookworm on mobiread.com'
__version__   = '1'
__date__      = '16, October 2010'
__docformat__ = 'English'



from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class MalaysianMirror(BasicNewsRecipe):
    title      = 'MalaysianMirror'
    __author__ = 'Tonythebookworm'
    description = 'The Pulse of the Nation'
    language = 'en'
    no_stylesheets = True
    publisher           = 'Tonythebookworm'
    category            = 'news'
    use_embedded_content= False
    no_stylesheets      = True
    oldest_article      = 24
    
    remove_javascript   = True
    remove_empty_feeds  = True
    conversion_options = {'linearize_tables' : True}
    extra_css = '''
                    #content_heading{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    
                    td{text-align:right; font-size:small;margin-top:0px;margin-bottom: 0px;}
                    
                    #content_body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
                '''
    
    keep_only_tags     = [dict(name='table', attrs={'class':['contentpaneopen']})
                          ]
    remove_tags = [dict(name='table', attrs={'class':['buttonheading']})]
    #######################################################################################################################
    
    
    max_articles_per_feed = 10
    
    '''
    Make a variable that will hold the url for the main site because our links do not include the index
    '''
    
    INDEX = 'http://www.malaysianmirror.com'
    
    
    
    
    def parse_index(self):
        feeds = []
        for title, url in [
                            (u"Media Buzz", u"http://www.malaysianmirror.com/media-buzz-front"),
                            (u"Life Style", u"http://www.malaysianmirror.com/lifestylefront"),
                            (u"Features", u"http://www.malaysianmirror.com/featurefront"),
                            
                            
                             ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds
        
    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        soup = self.index_to_soup(url)
       # print 'The soup is: ', soup
        for item in soup.findAll('div', attrs={'class':'contentheading'}):
            print 'item is: ', item
            link = item.find('a')
            print 'the link is: ', link
            if link:
                url         = self.INDEX + link['href']
                title       = self.tag_to_string(link)
                print 'the title is: ', title
                print 'the url is: ', url
                print 'the title is: ', title
                current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) # append all this
        return current_articles
      
    def preprocess_html(self, soup):
        for item in soup.findAll(attrs={'style':True}):
            del item['style']
        return soup
TonytheBookworm is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipes and RSS feeds and organization questions flyash Calibre 13 06-11-2010 03:56 AM
Help with RSS recipes fmma Calibre 1 06-15-2009 11:51 AM
Request for recipes mccande Calibre 4 12-26-2008 01:05 AM
Request for Recipes GPThomson Calibre 2 11-21-2008 12:19 PM
Request for Recipes girlperson1 Calibre 4 11-12-2008 03:25 PM


All times are GMT -4. The time now is 10:37 AM.


MobileRead.com is a privately owned, operated and funded community.