Quote:
Originally Posted by zhixiangpan
Dear Starson17:
Thank you for the help. I had check some post about append_pages code, but i don't know how to write the code fetch the hyperlink. the link in my page is like below.
Code:
<center>
<table border="0" align="center">
<tbody>
<tr>
<td>
<a href="/GB/14562/15549575.html">
<img src="/img/next_b.gif" border="0"/>
</a>
</td>
</tr>
</tbody>
</table>
</center>
Can you help me?
|
Without looking closely at your page, I can't be sure, but something like this may work:
Code:
pager = soup.find('a')
if pager.img['src'] == "/img/next_b.gif":
nexturl = self.INDEX + pager.a['href']
Find the <a> tag, see if it has an <img> tag that points to the "next" image (whatever that is), and if so, grab the href and append it to the INDEX.
If you don't know what pager is, see the various recipes that use append_page.
I hate posting code without testing it, so that part is up to you.