Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-14-2012, 05:30 AM   #1
faber1971
Enthusiast
faber1971 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2011
Device: Kindle 3
protected contents: can anyone help?

Cast an eye on this interesting website: http://aimse.blogspot.it/
The rss feed is: http://feeds.feedburner.com/MarketingSensoriale
Its behaviour is strange: Calibre recipe can't succeed in downloading contents, as they seem to be copy protected and there is not a print firendly version. Thus Calibre gives empty ebooks. Never found something similar before.
Can anyone help? Please find attached the temporary recipe I wrote, on which anyone can freely operate.
Attached Files
File Type: zip Marketing sensoriale_1085.zip (341 Bytes, 153 views)
faber1971 is offline   Reply With Quote
Old 06-18-2012, 07:31 PM   #2
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Looks like all the content is loaded via javascript. You could try using spynner to get the html output after the javascript rendering has taken place.
NotTaken is offline   Reply With Quote
Advert
Old 06-19-2012, 08:02 AM   #3
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Here is an example using spynner:

Code:
import spynner
from multiprocessing import Process, Queue


class MarketingSensoriale(BasicNewsRecipe):

    title                 = u'Marketing sensoriale'
    description           = 'Marketing Sensoriale, il Blog'
    category              = 'Blog'
    oldest_article        = 7
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf8'
    use_embedded_content  = False
    language              = 'it'
    remove_empty_feeds    = True
    recursions = 0
    auto_cleanup = False

    remove_tags_after    = [dict(name='div', attrs={'class':['article-footer']})]
    

    def get_article_url(self, article):
        return article.get('feedburner_origlink',  None)


    def grab(self,q,url):
        try:
            browser = spynner.Browser()
            browser.load(url)
            #10 second timeout
            browser.wait_load(10)
            q.put(browser.html) 
            browser.close()
        except:
            q.put(None)  

    def preprocess_raw_html(self, raw, url):
        q = Queue()        
        p = Process(target=self.grab, args=(q,url))
        p.start()
        html = q.get()
        return html


    feeds          = [(u'Marketing sensoriale', u'http://feeds.feedburner.com/MarketingSensoriale?format=xml')]

You need to install spynner before this will work.
NotTaken is offline   Reply With Quote
Old 06-19-2012, 12:18 PM   #4
faber1971
Enthusiast
faber1971 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2011
Device: Kindle 3
Thank you, but I'm running Vista and I can't manage to install it. Is there a way out?
faber1971 is offline   Reply With Quote
Old 06-19-2012, 04:34 PM   #5
camiller
Addict
camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.camiller ought to be getting tired of karma fortunes by now.
 
Posts: 285
Karma: 1387630
Join Date: Aug 2011
Device: Kobo Wireless
Just throwing this out, don't have time to try it myself, but will Google Reader fetch the feed correctly? If so you could use the Google Reader recipe to then get it from your Google Reader.

Granted, not a ideal solution.
camiller is offline   Reply With Quote
Advert
Old 06-20-2012, 09:14 AM   #6
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Quote:
Originally Posted by faber1971 View Post
Thank you, but I'm running Vista and I can't manage to install it. Is there a way out?
Could probably package it as a plugin as calibre shares a lot of the dependencies. I'll attach an inital attempt.

You can import into a recipe with:

Code:
import calibre_plugins.webkit_browser.spynner
spynner = calibre_plugins.webkit_browser.spynner
I had a quick go on windows but coudln't get the multiprocessing to work (pickling errors - even for a top level function), but then again my python skills aren't too good!.
Attached Files
File Type: zip webkit_browser.zip (129.3 KB, 134 views)
NotTaken is offline   Reply With Quote
Old 06-20-2012, 07:46 PM   #7
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
I've come up with a cross platform solution:

windows instructions:

extract the contents of the attached zip to C:\ (or change path in recipe)

use this recipe:

Code:
from multiprocessing import Process, Queue
from calibre.ebooks.BeautifulSoup import *
import subprocess
import tempfile
import os

class MarketingSensoriale(BasicNewsRecipe):

    title                 = u'Marketing sensoriale'
    description           = 'Marketing Sensoriale, il Blog'
    category              = 'Blog'
    oldest_article        = 7
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf8'
    use_embedded_content  = False
    language              = 'it'
    remove_empty_feeds    = True
    recursions = 0
    delay = 0.0001
    auto_cleanup = False

    remove_tags_after    = [dict(name='div', attrs={'class':['article-footer']})]
    
    def get_article_url(self, article):
        return article.get('feedburner_origlink',  None)


    def preprocess_raw_html(self, raw, url):

        temp_handle, temp_path = tempfile.mkstemp()

        try:
            # nix style path
            #subprocess.check_call(["calibre-debug", "-e","/home/will/Downloads/spynner_calibre/grabber.py",url,temp_path], shell=False)
            # windows style path
            subprocess.check_call(["calibre-debug", "-e","C:\spynner_calibre\grabber.py",url,temp_path], shell=False)
        except:
            print 'spynner fetch failed'
        
        try:
            f = os.fdopen(temp_handle,'r')
            html = f.read()
        finally:
            f.close()
            try:
                os.remove(temp_path)
            except:
                print 'could not delete temp file:' + temp_path
            
        return html


    feeds          = [(u'Marketing sensoriale', u'http://feeds.feedburner.com/MarketingSensoriale?format=xml')]
I have two questions:

Can someone experienced verify that the subprocess.call in the above recipe is safe from shell injection?

Is it possible to import stuff from a calibre plugin into a script executed by calibre-debug? For example, I can't use
Code:
 import calibre_plugins.webkit_browser.spynner
.

Edit: got the answer to the 2nd question. You can add: from calibre.customize.ui import * , before the plugin import
Attached Files
File Type: zip spynner_calibre.zip (147.6 KB, 129 views)

Last edited by NotTaken; 06-20-2012 at 09:27 PM.
NotTaken is offline   Reply With Quote
Old 06-21-2012, 12:10 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You should use calibre's bundled facilities for this kind of thing. Instead of subprocess use

fork_job() from calibre.utils.ipc.simple_worker

and instead of spynner use the jsbrowser module from calibre. Note that jsbrowser isn't quite complete, but it should probably work for this basic task.
kovidgoyal is offline   Reply With Quote
Old 06-21-2012, 06:40 AM   #9
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Quote:
Originally Posted by kovidgoyal View Post
You should use calibre's bundled facilities for this kind of thing. Instead of subprocess use

fork_job() from calibre.utils.ipc.simple_worker

and instead of spynner use the jsbrowser module from calibre. Note that jsbrowser isn't quite complete, but it should probably work for this basic task.
Thanks. I'll look into that.
NotTaken is offline   Reply With Quote
Old 06-21-2012, 04:04 PM   #10
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Changed to jsbrowser and calibre forking (still requires a plugin - attached) :

Code:
from calibre.web.feeds.news import BasicNewsRecipe
import os
from calibre.utils.ipc.simple_worker import *
from calibre_plugins.recipe_fork_helper import wrapper
import tempfile

dummy_module = '''

import calibre.web.jsbrowser.browser as jsbrowser

def grab(url):
    browser = jsbrowser.Browser()
    #10 second timeout
    browser.visit(url, 10)
    browser.run_for_a_time(10)
    html = browser.html
    browser.close()
    return html

    '''
class MarketingSensoriale(BasicNewsRecipe):

    title                 = u'Marketing sensoriale'
    description           = 'Marketing Sensoriale, il Blog'
    category              = 'Blog'
    oldest_article        = 7
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf8'
    use_embedded_content  = False
    language              = 'it'
    remove_empty_feeds    = True
    recursions = 0
    auto_cleanup = False
    delay = 0.00000001

    remove_tags_after    = [dict(name='div', attrs={'class':['article-footer']})]
    

    def get_article_url(self, article):
        return article.get('feedburner_origlink',  None)

    def preprocess_raw_html(self, raw, url):
        temp_handle, temp_path = tempfile.mkstemp()
        try:
            f = os.fdopen(temp_handle,'w')
            f.write(dummy_module)
        finally:
            f.close()      
            
        result = fork_job('calibre_plugins.recipe_fork_helper','wrapper',(temp_path, 'grab',(url)))
        
        try:
            os.remove(temp_path)
        except:
            print 'could not delete temp file:' + temp_path
            
        html = result['result']
        return html


    feeds          = [(u'Marketing sensoriale', u'http://feeds.feedburner.com/MarketingSensoriale?format=xml')]
Plugin loads a module from a file containing python source and calls function given in second argument. Couldn't find a way to import the recipe over on the dark side (child process). Maybe someone knows how?

Edit: i.e. can you provide a well defined module name of the recipe directly to fork_job (so all code is contained within recipe)
Attached Files
File Type: zip fork_helper.zip (992 Bytes, 131 views)

Last edited by NotTaken; 06-21-2012 at 04:22 PM.
NotTaken is offline   Reply With Quote
Old 06-22-2012, 05:04 AM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I added a module_is_source_code parameter to fork_job() that will allow you to pass python source code as mod instead of a module name. Will be available in next week's release.
kovidgoyal is offline   Reply With Quote
Old 06-22-2012, 07:34 AM   #12
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Thanks Kovid. So it will look something like this:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.utils.ipc.simple_worker import *

dummy_module = '''

import calibre.web.jsbrowser.browser as jsbrowser

def grab(url):
    browser = jsbrowser.Browser()
    #10 second timeout
    browser.visit(url, 10)
    browser.run_for_a_time(10)
    html = browser.html
    browser.close()
    return html

    '''
class MarketingSensoriale(BasicNewsRecipe):

    title                 = u'Marketing sensoriale'
    description           = 'Marketing Sensoriale, il Blog'
    category              = 'Blog'
    oldest_article        = 7
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf8'
    use_embedded_content  = False
    language              = 'it'
    remove_empty_feeds    = True
    recursions = 0
    requires_version = (0, 8, 58)
    auto_cleanup = False
    simultaneous_downloads = 1

    remove_tags_after    = [dict(name='div', attrs={'class':['article-footer']})]
    

    def get_article_url(self, article):
        return article.get('feedburner_origlink',  None)

    def preprocess_raw_html(self, raw, url):

        result = fork_job(dummy_module,'grab',(url,),module_is_source_code=True)
           
        html = result['result']
        return html


    feeds          = [(u'Marketing sensoriale', u'http://feeds.feedburner.com/MarketingSensoriale?format=xml')]

Last edited by NotTaken; 06-22-2012 at 08:08 AM. Reason: updated with Kovid's fix
NotTaken is offline   Reply With Quote
Old 06-22-2012, 07:49 AM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
yes, should be (url,) (note the trailing comma)
kovidgoyal is offline   Reply With Quote
Old 06-22-2012, 08:09 AM   #14
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Quote:
Originally Posted by kovidgoyal View Post
yes, should be (url,) (note the trailing comma)
Thanks again. I've updated the recipe above.
NotTaken is offline   Reply With Quote
Old 06-23-2012, 04:46 AM   #15
faber1971
Enthusiast
faber1971 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2011
Device: Kindle 3
Quote:
Originally Posted by NotTaken View Post
Thanks again. I've updated the recipe above.
Thanks to you and God Kovid.
It's incredible how, even if we try to empower our skills, there is always someone who flies in an upper sky. Thank you, guys.
Anyhow, let's try to sum up the output of this interesting thread:
1) what can we do for the recipe of Marketing Sensoriale? Will it be automatically updated in next Calibre update?
2) will the new browser feature automatically be annexed into next Calibre release or should we download it separately, as a plugin?
faber1971 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
360 DRM protected prc msca PocketBook 2 01-25-2012 06:16 AM
Are iBook pubs protected? mjhudston Apple Devices 14 01-01-2011 10:13 AM
This book is protected by DRM racsw Calibre 2 12-19-2010 12:16 AM
Protected page trout Sony Reader 6 07-08-2010 08:24 AM
PDF protected by DRM, only... it's not? pooks Calibre 17 01-30-2010 11:44 PM


All times are GMT -4. The time now is 07:23 AM.


MobileRead.com is a privately owned, operated and funded community.