View Single Post
Old 12-01-2011, 01:17 PM   #2
Barty
doofus
Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.
 
Barty's Avatar
 
Posts: 2,551
Karma: 13089041
Join Date: Sep 2010
Device: Kobo Libra 2, Kindle Voyage
I don't know if there's a better way to do this but it seems to work

Code:
    def print_version(self, url):
        soup = self.index_to_soup(url)
        regex = re.compile(r'javascript:printPage\((\d+?)\)',re.I)
        atag = soup.find('a',attrs={'href':regex})
        if atag is not None:
            m = regex.search(atag['href'])
            if m:
                url = 'http://www.christianitytoday.com/ct/article_print.html?id='+m.group(1)
        return url
this load the original page and find the article id by parsing the file

Note: add

Code:
import re
to the start of your recipe
Barty is offline   Reply With Quote