MobileRead Forums - View Single Post - Recipe for Caijing Magazine (zh-CN)

forceps · 03-22-2011, 01:10 AM

It maybe related to Beautifulsoup, the section contain cover is:

<div class=bigimg><em><img height=590 alt=财经杂志 src="http://img.caijing.com.cn/2011-03-14/110664866.jpg" width=459 border=0></em></div>

feed it to Beautifulsoup, the result is:

<div class="bigimg"><em><img height="590" alt="" /></em></div>

therefore u may need bypass soup and grab the cover url from current frontpage directly.

I would suggest add something like:
from contextlib import closing
br = BasicNewsRecipe.get_browser()
with closing(br.open(current_issue_url)) as f:

frontpage_html = f.read()

cover_urls = re.findall(r'<div class=bigimg(.*?)</div>', frontpage_html)
div_cover = re.findall(r'src="(.*?)"', str(cover_urls))[0]

03-22-2011, 01:10 AM	#2
forceps Enthusiast Posts: 26 Karma: 168 Join Date: May 2005 Location: Wuhan, China Device: Kindle DXG	It maybe related to Beautifulsoup, the section contain cover is: <div class=bigimg><em><img height=590 alt=财经杂志 src="http://img.caijing.com.cn/2011-03-14/110664866.jpg" width=459 border=0></em></div> feed it to Beautifulsoup, the result is: <div class="bigimg"><em><img height="590" alt="" /></em></div> therefore u may need bypass soup and grab the cover url from current frontpage directly. I would suggest add something like: from contextlib import closing br = BasicNewsRecipe.get_browser() with closing(br.open(current_issue_url)) as f: frontpage_html = f.read() cover_urls = re.findall(r'<div class=bigimg(.?)</div>', frontpage_html) div_cover = re.findall(r'src="(.?)"', str(cover_urls))[0]