MobileRead Forums - View Single Post

JayKindle · 09-15-2011, 03:32 AM

Okay a little more info. The site I am trying to fetch has a Print feature--which have a cleaner layout--but still has ad banners.

I was trying to follow the recipe for making links to fetch the data from a Print page instead. But I am having problems knowing where I add what to the code -- or more like what do I put to make this work.

Here is a normal link to that site:
http://www.mixingonbeat.com/phpbb/viewtopic.php?t=6452

Here is a print link to that site:
http://www.mixingonbeat.com/phpbb/vi...ote=viewresult

Here is their RSS Feed to that page:
http://www.mixingonbeat.com/phpbb/rss.php?t=6452

Here is the code I am trying to work with:

Spoiler:

I really hope someone can help. Thanks.

09-15-2011, 03:32 AM	#3
JayKindle Connoisseur Posts: 69 Karma: 10 Join Date: Sep 2011 Device: Kindle Fire HD 8	Okay a little more info. The site I am trying to fetch has a Print feature--which have a cleaner layout--but still has ad banners. I was trying to follow the recipe for making links to fetch the data from a Print page instead. But I am having problems knowing where I add what to the code -- or more like what do I put to make this work. Here is a normal link to that site: http://www.mixingonbeat.com/phpbb/viewtopic.php?t=6452 Here is a print link to that site: http://www.mixingonbeat.com/phpbb/vi...ote=viewresult Here is their RSS Feed to that page: http://www.mixingonbeat.com/phpbb/rss.php?t=6452 Here is the code I am trying to work with: Spoiler: ''' We need to take and find all instances of /content/printVersion/ So in order to do this we take and setup a temp list Then we turn on the flag to tell calibre/beautifulsoup that the articles are obfuscated Then we take and get the obfuscated article (in our case the print version) We take and create a browser and let calibre do all the work for us. It will open an internal browser and follow then links that match the regular expression of .?(\\/)(content)(\\/)(printVersion)(\\/) so basically any link that looks like this /content/printVersion/ it takes and writes all the information to a temp html file. that the recipe/calibre will parse from. And thats all that is needed for this recipe. ''' temp_files = [] articles_are_obfuscated = True def get_obfuscated_article(self, url): br = self.get_browser() print 'THE CURRENT URL IS: ', url br.open(url) ''' we need to use a try catch block: what this does is trys to do an operation and if it fails instead of crashing it simply catchs it and does something with the error. So in our case we take and check to see if we can follow /content/printVersion, then if we can't then we simply pass it back the original calling url ''' try: response = br.follow_link(url_regex='.?(\\/)(content)(\\/)(printVersion)(\\/)', nr = 0) html = response.read() except: response = br.open(url) html = response.read() self.temp_files.append(PersistentTemporaryFile('_f a.html')) self.temp_files[-1].write(html) self.temp_files[-1].close() return self.temp_files[-1].name I really hope someone can help. Thanks.