I worked for a recipe for
www.nikkei.com japanese economic news site.
After several trial, I got a issue. Any suggestions?
The recipe makes an index that works good but site returns each html that contains automatic post form in order to process login state.
An essence of recipe as follows:
Code:
import string, re, sys
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
class NikkeiNet_subscription(BasicNewsRecipe):
title = u'\u65e5\u7d4c\u65b0\u805e\u96fb\u5b50\u7248'
__author__ = 'Hiroshi Miura'
description = 'News and current market affairs from Japan'
needs_subscription = True
oldest_article = 2
max_articles_per_feed = 20
language = 'ja'
recursions = 3
remove_javascript = False
feeds = [
(u'\u65e5\u7d4c\u4f01\u696d', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=sangyo')
]
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('https://id.nikkei.com/lounge/nl/base/LA0010.seam')
response = br.response()
response.set_data(response.get_data().replace("<input id=\"j_id48\"", "<!-- "))
response.set_data(response.get_data().replace("gm_home_on.gif\" />", " -->"))
br.set_response(response)
br.select_form(name='LA0010Form01')
br['LA0010Form01:LA0010Email'] = self.username
br['LA0010Form01:LA0010Password'] = self.password
res = br.submit()
raw = res.read()
if '日経IDのサービス一覧へ' not in raw:
raise ValueError('Failed to log in to nikkei.net, check your username(email address) and password')
br.open('http://www.nikkei.com/')
br.select_form(nr=0)
res = br.submit()
print res.read()
return br
It returns like: (grab from debug output)
Code:
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="ja">
<head>
<meta http-equiv="Content-Style-Type" content="text/css"/>
<meta http-equiv="Content-Script-Type" content="text/javascript"/>
<meta http-equiv="Pragma" content="no-cache"/>
<meta http-equiv="Cache-Control" content="no-cache"/>
<meta http-equiv="Expires" content="0"/>
<title/>
<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/><link href="../../stylesheet.css" type="text/css" rel="stylesheet"/><style type="text/css">@page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style></head>
<body onload="document.autoPostForm.submit()" class="calibre">
<div class="calibrenavbar">| <a href="../article_1/index.html" class="calibre5">Next</a>
| <a href="../index.html#article_0" class="calibre5">Section Menu</a> | <a href="../../index.html#feed_0" class="calibre5">Main Menu</a> | <hr class="calibre6"/></div>
<form action="https://id.nikkei.com/lounge/ep/authonly" method="post" name="autoPostForm" class="calibre7">
<div class="calibre7">
<input type="hidden" name="rpid" value="DS"/>
<input type="hidden" name="pxep" value="https://regist.nikkei.com/ds/etc/accounts/auth?url=http%3A%2F%2Fwww.nikkei.com%2Fnews%2Fcategory%2Farticle%2Fg%3D96958A9C93819594E2EAE2E79C8DE2E4E3E3E0E2E3E29F9FE2E2E2E2%3Bat%3DDGXZZO0195165008122009000000"/>
<input type="hidden" name="rtur" value=""/>
<input type="hidden" name="clg" value="715319105982499111506319898"/>
<input type="hidden" name="dps" value="3"/>
<input type="hidden" name="xp0" value=""/>
</div>
<input type="submit" class="calibre8"/>
</form>
<div class="calibrenavbar">
<hr class="calibre6"/>
<p class="calibre9">This article was downloaded by <strong class="calibre10">calibre</strong> from <a href="http://www.nikkei.com/news/category/article/g=96958A9C93819594E2EAE2E79C8DE2E4E3E3E0E2E3E29F9FE2E2E2E2;at=DGXZZO0195165008122009000000" class="calibre5">http://www.nikkei.com/news/category/article/g=96958A9C93819594E2EAE2E79C8DE2E4E3E3E0E2E3E29F9FE2E2E2E2;at=DGXZZO0195165008122009000000</a></p>
<br class="calibre7"/><br class="calibre7"/> | <a href="../index.html#article_0"
class="calibre5">Section Menu</a> | <a href="../../index.html#feed_0" class="calibre5">Main Menu</a> | </div></body>
</html>
Non subscriber version of this works fine.
It seems that is no good method/function for solve this situation with Calibre API.
Hiroshi