View Single Post
Old 06-22-2012, 07:34 AM   #12
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Thanks Kovid. So it will look something like this:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.utils.ipc.simple_worker import *

dummy_module = '''

import calibre.web.jsbrowser.browser as jsbrowser

def grab(url):
    browser = jsbrowser.Browser()
    #10 second timeout
    browser.visit(url, 10)
    browser.run_for_a_time(10)
    html = browser.html
    browser.close()
    return html

    '''
class MarketingSensoriale(BasicNewsRecipe):

    title                 = u'Marketing sensoriale'
    description           = 'Marketing Sensoriale, il Blog'
    category              = 'Blog'
    oldest_article        = 7
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf8'
    use_embedded_content  = False
    language              = 'it'
    remove_empty_feeds    = True
    recursions = 0
    requires_version = (0, 8, 58)
    auto_cleanup = False
    simultaneous_downloads = 1

    remove_tags_after    = [dict(name='div', attrs={'class':['article-footer']})]
    

    def get_article_url(self, article):
        return article.get('feedburner_origlink',  None)

    def preprocess_raw_html(self, raw, url):

        result = fork_job(dummy_module,'grab',(url,),module_is_source_code=True)
           
        html = result['result']
        return html


    feeds          = [(u'Marketing sensoriale', u'http://feeds.feedburner.com/MarketingSensoriale?format=xml')]

Last edited by NotTaken; 06-22-2012 at 08:08 AM. Reason: updated with Kovid's fix
NotTaken is offline   Reply With Quote