I'm still struggling a bit here. I still can't tease the actual URL of the comic image out of the HTML returned.
The URL I've been working with is for the Zits comic of the current day:
http://www.washingtonpost.com/wp-srv...html?name=Zits
The source HTML of this page includes the following where the image will go:
Code:
<div id="comic_full"> <script>document.writeln(img)</script> </div>
When I use "Inspect element" from my Chrome browser I see that this gets changed to:
Code:
<div id="comic_full">
<script>document.writeln(img)</script>
<img src="http://est.rbma.com/content/Zits">
</div>
With my recipe, I've tried getting the HTML via the index_to_soup() method and I've grabbing the HTML using the mechanize browser:
Code:
def get_browser(self):
print "In get_browser"
br = BasicNewsRecipe.get_browser()
br.set_handle_refresh(False)
url = ('http://www.washingtonpost.com/wp-srv/artsandliving/comics/king_zits.html?name=Zits')
raw = br.open(url).read()
print raw
return br
In each case, I never get the actual URL of the image. So right now I have absolutely no idea where to go from here. I've pored through the documentation and APIs and I see no way to make this work.
Any help is much appreciated.