MobileRead Forums - View Single Post

joseelsegundo · 07-13-2011, 10:51 PM

I'm still struggling a bit here. I still can't tease the actual URL of the comic image out of the HTML returned.

The URL I've been working with is for the Zits comic of the current day:
http://www.washingtonpost.com/wp-srv...html?name=Zits

The source HTML of this page includes the following where the image will go:

Code:

<div id="comic_full"> <script>document.writeln(img)</script> </div>

When I use "Inspect element" from my Chrome browser I see that this gets changed to:

Code:

<div id="comic_full">
<script>document.writeln(img)</script>
<img src="http://est.rbma.com/content/Zits">
</div>

With my recipe, I've tried getting the HTML via the index_to_soup() method and I've grabbing the HTML using the mechanize browser:

Code:

def get_browser(self):
	print "In get_browser"
        br = BasicNewsRecipe.get_browser()
        br.set_handle_refresh(False)
        url = ('http://www.washingtonpost.com/wp-srv/artsandliving/comics/king_zits.html?name=Zits')
        raw = br.open(url).read()
        print raw
        return br

In each case, I never get the actual URL of the image. So right now I have absolutely no idea where to go from here. I've pored through the documentation and APIs and I see no way to make this work.

Any help is much appreciated.

07-13-2011, 10:51 PM	#7
joseelsegundo Junior Member Posts: 6 Karma: 10 Join Date: Jul 2011 Device: Kindle	I'm still struggling a bit here. I still can't tease the actual URL of the comic image out of the HTML returned. The URL I've been working with is for the Zits comic of the current day: http://www.washingtonpost.com/wp-srv...html?name=Zits The source HTML of this page includes the following where the image will go: Code: <div id="comic_full"> <script>document.writeln(img)</script> </div> When I use "Inspect element" from my Chrome browser I see that this gets changed to: Code: <div id="comic_full"> <script>document.writeln(img)</script> <img src="http://est.rbma.com/content/Zits"> </div> With my recipe, I've tried getting the HTML via the index_to_soup() method and I've grabbing the HTML using the mechanize browser: Code: def get_browser(self): print "In get_browser" br = BasicNewsRecipe.get_browser() br.set_handle_refresh(False) url = ('http://www.washingtonpost.com/wp-srv/artsandliving/comics/king_zits.html?name=Zits') raw = br.open(url).read() print raw return br In each case, I never get the actual URL of the image. So right now I have absolutely no idea where to go from here. I've pored through the documentation and APIs and I see no way to make this work. Any help is much appreciated.