View Single Post
Old 07-13-2011, 09:51 PM   #7
joseelsegundo
Junior Member
joseelsegundo began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2011
Device: Kindle
I'm still struggling a bit here. I still can't tease the actual URL of the comic image out of the HTML returned.

The URL I've been working with is for the Zits comic of the current day:
http://www.washingtonpost.com/wp-srv...html?name=Zits

The source HTML of this page includes the following where the image will go:
Code:
<div id="comic_full"> <script>document.writeln(img)</script> </div>
When I use "Inspect element" from my Chrome browser I see that this gets changed to:
Code:
<div id="comic_full">
<script>document.writeln(img)</script>
<img src="http://est.rbma.com/content/Zits">
</div>
With my recipe, I've tried getting the HTML via the index_to_soup() method and I've grabbing the HTML using the mechanize browser:
Code:
def get_browser(self):
	print "In get_browser"
        br = BasicNewsRecipe.get_browser()
        br.set_handle_refresh(False)
        url = ('http://www.washingtonpost.com/wp-srv/artsandliving/comics/king_zits.html?name=Zits')
        raw = br.open(url).read()
        print raw
        return br
In each case, I never get the actual URL of the image. So right now I have absolutely no idea where to go from here. I've pored through the documentation and APIs and I see no way to make this work.

Any help is much appreciated.
joseelsegundo is offline   Reply With Quote