View Single Post
Old 03-22-2011, 12:10 AM   #2
forceps
Enthusiast
forceps doesn't litterforceps doesn't litter
 
Posts: 26
Karma: 168
Join Date: May 2005
Location: Wuhan, China
Device: Kindle DXG
It maybe related to Beautifulsoup, the section contain cover is:

<div class=bigimg><em><img height=590 alt=财经杂志 src="http://img.caijing.com.cn/2011-03-14/110664866.jpg" width=459 border=0></em></div>

feed it to Beautifulsoup, the result is:

<div class="bigimg"><em><img height="590" alt="" /></em></div>

therefore u may need bypass soup and grab the cover url from current frontpage directly.

I would suggest add something like:
from contextlib import closing
br = BasicNewsRecipe.get_browser()
with closing(br.open(current_issue_url)) as f:
frontpage_html = f.read()
cover_urls = re.findall(r'<div class=bigimg(.*?)</div>', frontpage_html)
div_cover = re.findall(r'src="(.*?)"', str(cover_urls))[0]
forceps is offline   Reply With Quote