View Single Post
Old 10-07-2010, 10:18 AM   #2
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Original url is: http://www.laweekly.com/2010-10-07/f...ervation-road/

The print version is kinda tough to get but we can fix that.

print url is: http://www.laweekly.com/content/printVersion/1080621/

just use something along the lines of this:
Spoiler:

Code:
temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
       
        br = self.get_browser()
        br.open(url)

        response = br.follow_link(url_regex = r'/content/printVersion/[0-9]+', nr = 0)
        html = response.read()

        self.temp_files.append(PersistentTemporaryFile('_temparse.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()

        return self.temp_files[-1].name
TonytheBookworm is offline   Reply With Quote