06-14-2012, 05:30 AM | #1 |
Enthusiast
Posts: 46
Karma: 10
Join Date: Dec 2011
Device: Kindle 3
|
protected contents: can anyone help?
Cast an eye on this interesting website: http://aimse.blogspot.it/
The rss feed is: http://feeds.feedburner.com/MarketingSensoriale Its behaviour is strange: Calibre recipe can't succeed in downloading contents, as they seem to be copy protected and there is not a print firendly version. Thus Calibre gives empty ebooks. Never found something similar before. Can anyone help? Please find attached the temporary recipe I wrote, on which anyone can freely operate. |
06-19-2012, 08:02 AM | #3 |
Connoisseur
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
Here is an example using spynner:
Code:
import spynner from multiprocessing import Process, Queue class MarketingSensoriale(BasicNewsRecipe): title = u'Marketing sensoriale' description = 'Marketing Sensoriale, il Blog' category = 'Blog' oldest_article = 7 max_articles_per_feed = 200 no_stylesheets = True encoding = 'utf8' use_embedded_content = False language = 'it' remove_empty_feeds = True recursions = 0 auto_cleanup = False remove_tags_after = [dict(name='div', attrs={'class':['article-footer']})] def get_article_url(self, article): return article.get('feedburner_origlink', None) def grab(self,q,url): try: browser = spynner.Browser() browser.load(url) #10 second timeout browser.wait_load(10) q.put(browser.html) browser.close() except: q.put(None) def preprocess_raw_html(self, raw, url): q = Queue() p = Process(target=self.grab, args=(q,url)) p.start() html = q.get() return html feeds = [(u'Marketing sensoriale', u'http://feeds.feedburner.com/MarketingSensoriale?format=xml')] You need to install spynner before this will work. |
06-19-2012, 12:18 PM | #4 |
Enthusiast
Posts: 46
Karma: 10
Join Date: Dec 2011
Device: Kindle 3
|
Thank you, but I'm running Vista and I can't manage to install it. Is there a way out?
|
06-19-2012, 04:34 PM | #5 |
Addict
Posts: 285
Karma: 1387630
Join Date: Aug 2011
Device: Kobo Wireless
|
Just throwing this out, don't have time to try it myself, but will Google Reader fetch the feed correctly? If so you could use the Google Reader recipe to then get it from your Google Reader.
Granted, not a ideal solution. |
Advert | |
|
06-20-2012, 09:14 AM | #6 | |
Connoisseur
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
Quote:
You can import into a recipe with: Code:
import calibre_plugins.webkit_browser.spynner spynner = calibre_plugins.webkit_browser.spynner |
|
06-20-2012, 07:46 PM | #7 |
Connoisseur
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
I've come up with a cross platform solution:
windows instructions: extract the contents of the attached zip to C:\ (or change path in recipe) use this recipe: Code:
from multiprocessing import Process, Queue from calibre.ebooks.BeautifulSoup import * import subprocess import tempfile import os class MarketingSensoriale(BasicNewsRecipe): title = u'Marketing sensoriale' description = 'Marketing Sensoriale, il Blog' category = 'Blog' oldest_article = 7 max_articles_per_feed = 200 no_stylesheets = True encoding = 'utf8' use_embedded_content = False language = 'it' remove_empty_feeds = True recursions = 0 delay = 0.0001 auto_cleanup = False remove_tags_after = [dict(name='div', attrs={'class':['article-footer']})] def get_article_url(self, article): return article.get('feedburner_origlink', None) def preprocess_raw_html(self, raw, url): temp_handle, temp_path = tempfile.mkstemp() try: # nix style path #subprocess.check_call(["calibre-debug", "-e","/home/will/Downloads/spynner_calibre/grabber.py",url,temp_path], shell=False) # windows style path subprocess.check_call(["calibre-debug", "-e","C:\spynner_calibre\grabber.py",url,temp_path], shell=False) except: print 'spynner fetch failed' try: f = os.fdopen(temp_handle,'r') html = f.read() finally: f.close() try: os.remove(temp_path) except: print 'could not delete temp file:' + temp_path return html feeds = [(u'Marketing sensoriale', u'http://feeds.feedburner.com/MarketingSensoriale?format=xml')] Can someone experienced verify that the subprocess.call in the above recipe is safe from shell injection? Is it possible to import stuff from a calibre plugin into a script executed by calibre-debug? For example, I can't use Code:
import calibre_plugins.webkit_browser.spynner Edit: got the answer to the 2nd question. You can add: from calibre.customize.ui import * , before the plugin import Last edited by NotTaken; 06-20-2012 at 09:27 PM. |
06-21-2012, 12:10 AM | #8 |
creator of calibre
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You should use calibre's bundled facilities for this kind of thing. Instead of subprocess use
fork_job() from calibre.utils.ipc.simple_worker and instead of spynner use the jsbrowser module from calibre. Note that jsbrowser isn't quite complete, but it should probably work for this basic task. |
06-21-2012, 06:40 AM | #9 | |
Connoisseur
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
Quote:
|
|
06-21-2012, 04:04 PM | #10 |
Connoisseur
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
Changed to jsbrowser and calibre forking (still requires a plugin - attached) :
Code:
from calibre.web.feeds.news import BasicNewsRecipe import os from calibre.utils.ipc.simple_worker import * from calibre_plugins.recipe_fork_helper import wrapper import tempfile dummy_module = ''' import calibre.web.jsbrowser.browser as jsbrowser def grab(url): browser = jsbrowser.Browser() #10 second timeout browser.visit(url, 10) browser.run_for_a_time(10) html = browser.html browser.close() return html ''' class MarketingSensoriale(BasicNewsRecipe): title = u'Marketing sensoriale' description = 'Marketing Sensoriale, il Blog' category = 'Blog' oldest_article = 7 max_articles_per_feed = 200 no_stylesheets = True encoding = 'utf8' use_embedded_content = False language = 'it' remove_empty_feeds = True recursions = 0 auto_cleanup = False delay = 0.00000001 remove_tags_after = [dict(name='div', attrs={'class':['article-footer']})] def get_article_url(self, article): return article.get('feedburner_origlink', None) def preprocess_raw_html(self, raw, url): temp_handle, temp_path = tempfile.mkstemp() try: f = os.fdopen(temp_handle,'w') f.write(dummy_module) finally: f.close() result = fork_job('calibre_plugins.recipe_fork_helper','wrapper',(temp_path, 'grab',(url))) try: os.remove(temp_path) except: print 'could not delete temp file:' + temp_path html = result['result'] return html feeds = [(u'Marketing sensoriale', u'http://feeds.feedburner.com/MarketingSensoriale?format=xml')] Edit: i.e. can you provide a well defined module name of the recipe directly to fork_job (so all code is contained within recipe) Last edited by NotTaken; 06-21-2012 at 04:22 PM. |
06-22-2012, 05:04 AM | #11 |
creator of calibre
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I added a module_is_source_code parameter to fork_job() that will allow you to pass python source code as mod instead of a module name. Will be available in next week's release.
|
06-22-2012, 07:34 AM | #12 |
Connoisseur
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
Thanks Kovid. So it will look something like this:
Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.utils.ipc.simple_worker import * dummy_module = ''' import calibre.web.jsbrowser.browser as jsbrowser def grab(url): browser = jsbrowser.Browser() #10 second timeout browser.visit(url, 10) browser.run_for_a_time(10) html = browser.html browser.close() return html ''' class MarketingSensoriale(BasicNewsRecipe): title = u'Marketing sensoriale' description = 'Marketing Sensoriale, il Blog' category = 'Blog' oldest_article = 7 max_articles_per_feed = 200 no_stylesheets = True encoding = 'utf8' use_embedded_content = False language = 'it' remove_empty_feeds = True recursions = 0 requires_version = (0, 8, 58) auto_cleanup = False simultaneous_downloads = 1 remove_tags_after = [dict(name='div', attrs={'class':['article-footer']})] def get_article_url(self, article): return article.get('feedburner_origlink', None) def preprocess_raw_html(self, raw, url): result = fork_job(dummy_module,'grab',(url,),module_is_source_code=True) html = result['result'] return html feeds = [(u'Marketing sensoriale', u'http://feeds.feedburner.com/MarketingSensoriale?format=xml')] Last edited by NotTaken; 06-22-2012 at 08:08 AM. Reason: updated with Kovid's fix |
06-22-2012, 07:49 AM | #13 |
creator of calibre
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yes, should be (url,) (note the trailing comma)
|
06-22-2012, 08:09 AM | #14 |
Connoisseur
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
|
06-23-2012, 04:46 AM | #15 |
Enthusiast
Posts: 46
Karma: 10
Join Date: Dec 2011
Device: Kindle 3
|
Thanks to you and God Kovid.
It's incredible how, even if we try to empower our skills, there is always someone who flies in an upper sky. Thank you, guys. Anyhow, let's try to sum up the output of this interesting thread: 1) what can we do for the recipe of Marketing Sensoriale? Will it be automatically updated in next Calibre update? 2) will the new browser feature automatically be annexed into next Calibre release or should we download it separately, as a plugin? |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
360 DRM protected prc | msca | PocketBook | 2 | 01-25-2012 06:16 AM |
Are iBook pubs protected? | mjhudston | Apple Devices | 14 | 01-01-2011 10:13 AM |
This book is protected by DRM | racsw | Calibre | 2 | 12-19-2010 12:16 AM |
Protected page | trout | Sony Reader | 6 | 07-08-2010 08:24 AM |
PDF protected by DRM, only... it's not? | pooks | Calibre | 17 | 01-30-2010 11:44 PM |