Irish Times - Problems Entering Subscription

leo738 · 07-15-2016, 06:50 AM

Hello all,

I'm looking for help entering email & password details into the following page:

http://www.irishtimes.com/signin

I've been trying to use code from other recipes with subscription models but not having much success. So far I've come up with the following modified recipe:

Code:

__license__  = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Tom Scholl"
'''
irishtimes.com
'''
import urlparse, re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile


class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns, Tom Scholl"
    description = 'Daily news from The Irish Times'
    needs_subscription = True

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        if self.username is not None and self.password is not None:
            br.open('http://www.irishtimes.com/signin')
            br.form = br.forms().next()       
	    br['email']   = self.username
            br['password'] = self.password
            raw = br.submit().read()
	    if 'Please try again' in raw:
                raise Exception('Your username and password are incorrect')
        return br

    language = 'en_IE'

    masthead_url = 'http://www.irishtimes.com/assets/images/generic/website/logo_theirishtimes.png'

    encoding = 'utf-8'
    oldest_article = 1.0
    max_articles_per_feed = 100
    remove_empty_feeds = True
    no_stylesheets = True
    temp_files = []
    articles_are_obfuscated = True

    feeds          = [
                      ('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('World', 'http://www.irishtimes.com/cmlink/irishtimesworldfeed-1.1321046'),
                      ('Politics', 'http://www.irishtimes.com/cmlink/irish-times-politics-rss-1.1315953'),
                      ('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'),
                      ('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'),
# Not interested in sport so commented out..                     
#		  ('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'),
                      ('Debate', 'http://www.irishtimes.com/cmlink/debate-1.1319211'),
                      ('Life & Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'),
                    ]


    def get_obfuscated_article(self, url):
        # Insert a pic from the original url, but use content from the print url
        pic = None
        pics = self.index_to_soup(url)
        div = pics.find('div', {'class' : re.compile('image-carousel')})
        if div:
            pic = div.img
            if pic:
                try:
                    pic['src'] = urlparse.urljoin(url, pic['src'])
                    pic.extract()
                except:
                    pic = None

        content = self.index_to_soup(url + '?mode=print&ot=example.AjaxPageLayout.ot')
        if pic:
            content.p.insert(0, pic)

        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(content.prettify())
        self.temp_files[-1].close()
        return self.temp_files[-1].name

I've been entering the wrong password to verify that the login is occurring but no success. It could be perhaps incorrect form or submit names.

Can anyone point me in the right direction?

Thanks,

Leo

leo738 · 07-16-2016, 06:17 AM

Some progress, I'm now getting a response from the website. However it saying that it's an invalid username or password (even if the correct ones are used), probably because the fields aren't being filled in correctly.

Perhaps I'm not selecting the correct form (I think it 'itPaywall').

Code:

__license__  = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Tom Scholl"
'''
irishtimes.com
'''
import urlparse, re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile


class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns, Tom Scholl"
    description = 'Daily news from The Irish Times'
    needs_subscription = True

    language = 'en_IE'

    masthead_url = 'http://www.irishtimes.com/assets/images/generic/website/logo_theirishtimes.png'

    encoding = 'utf-8'
    oldest_article = 1.0
    max_articles_per_feed = 100
    remove_empty_feeds = True
    no_stylesheets = True
    temp_files = []
    articles_are_obfuscated = True

    feeds          = [
                      ('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('World', 'http://www.irishtimes.com/cmlink/irishtimesworldfeed-1.1321046'),
                      ('Politics', 'http://www.irishtimes.com/cmlink/irish-times-politics-rss-1.1315953'),
                      ('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'),
                      ('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'),
# Not interested in sport so commented out..                     
#		  ('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'),
                      ('Debate', 'http://www.irishtimes.com/cmlink/debate-1.1319211'),
                      ('Life & Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'),
                    ]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        if self.username is not None and self.password is not None:
            br.open('http://www.irishtimes.com/signin')
            # is the correct form being selected below????
            br.form = br.forms().next()   
            br['email']   = self.username
            br['password'] = self.password
            raw = br.submit().read()
	    #print raw 
	    if 'Invalid email or password' in raw:
                raise Exception('Your username and password are incorrect')
        return br


    def get_obfuscated_article(self, url):
        # Insert a pic from the original url, but use content from the print url
        pic = None
        pics = self.index_to_soup(url)
        div = pics.find('div', {'class' : re.compile('image-carousel')})
        if div:
            pic = div.img
            if pic:
                try:
                    pic['src'] = urlparse.urljoin(url, pic['src'])
                    pic.extract()
                except:
                    pic = None

        content = self.index_to_soup(url + '?mode=print&ot=example.AjaxPageLayout.ot')
        if pic:
            content.p.insert(0, pic)

        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(content.prettify())
        self.temp_files[-1].close()
        return self.temp_files[-1].name

Aimylios · 07-16-2016, 07:45 AM

Yes, I think that should select the right form (the first one). Although you could also try this command if you are in doubt:

Code:

            br.select_form(nr=0)

I just had a brief look at the source code of the page and didn't try it out, but I think the string "Invalid email or password" is always included (even if it is not shown). You should remove that to see what happens or find another way to check the login status.

Code:

	    if 'Invalid email or password' in raw:
                raise Exception('Your username and password are incorrect')

leo738 · 07-16-2016, 03:35 PM

Hello,

Many thanks for your reply!

You're correct, that 'Your username and password are incorrect' is present in the page before the submit button is pushed so I edited that section out. Is there a simple way to verify it properly?

As suggested I added the snippet of code for the form & verified that the correct form was selected (by printing it to screen). It outputted:

Code:

<POST https://www.irishtimes.com/signin# application/x-www-form-urlencoded
  <TextControl(email=)>
  <PasswordControl(password=)>
  <SubmitButtonControl(<None>=) (readonly)>>

However it's still failing. Almost all the articles after the first 5 or so have lots of stuff about signing in.

Do I need to handle importing text from the command line argument? I haven’t added anything in that regard. Anything else you can think of?

Thanks again for looking,

Leo

leo738 · 07-18-2016, 03:09 PM

Not an issue importing password, or username from the command line

leo738 · 07-18-2016, 03:35 PM

after running with -vv option it looks like it may be a recipe issue rather than a login problem.
I'm see a lots of occasions of:

Code:

13% Article download failed: UK’s Trident nuclear programme splits Labour three ways
Failed to download article: Nice attack: ‘No words describe hell of bringing one’s child to the cemetery’ from http://www.irishtimes.com/news/world...tery-1.2725455
Traceback (most recent call last):
  File "site-packages/calibre/utils/threadpool.py", line 95, in run
  File "site-packages/calibre/web/feeds/news.py", line 1125, in fetch_obfuscated_article
  File "<string>", line 89, in get_obfuscated_article
ValueError: I/O operation on closed file

&

Code:

Could not fetch image  file:///polopoly_fs/1.2723622.1468599710!/image/image.jpg_gen/derivatives/landscape_140/image.jpg
Traceback (most recent call last):
  File "site-packages/calibre/web/fetch/simple.py", line 377, in process_images
  File "site-packages/calibre/web/fetch/simple.py", line 229, in fetch_url
IOError: [Errno 2] No such file or directory: u'/polopoly_fs/1.2723622.1468599710!/image/image.jpg_gen/derivatives/landscape_140/image.jpg'

Fetching file:///assets/images/icons/apps/app-store.png

&

Code:

20% Article download failed: Half of Irish consumers using contactless payments
Failed to download article: EU re-introduces milk supply controls barely a year after quotas from http://www.irishtimes.com/business/a...otas-1.2726088
Traceback (most recent call last):
  File "site-packages/calibre/utils/threadpool.py", line 95, in run
  File "site-packages/calibre/web/feeds/news.py", line 1125, in fetch_obfuscated_article
  File "<string>", line 89, in get_obfuscated_article
ValueError: I/O operation on closed file

Any ideas?

leo738 · 12-03-2016, 04:14 PM

Just getting back to this after a break. It looks like some issue around the submit button.

I've read up a little on the br.submit() command. Could it be that some javascript is needs to be executed to verify the login details after the button press which mechanize is unable to handle? Should I try use use POST instead?

Any help appreciated.

Leo

kovidgoyal · 12-03-2016, 10:23 PM

Yes, generally when a plain submit() does not work, it means there is javascript behind the scenes. WHat you do then is use the developer tools in a regular browser to see the requests generated by the login page when you click submit and clone them in the recipe. An example of doing that is in the WSJ recipe.

leo738 · 12-05-2016, 04:37 PM

Many thanks,

I managed to capture the js:

Code:

Host: www.irishtimes.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: https://www.irishtimes.com/signin
Content-Length: 106
Cookie: IT_cookiepopup=1; pw_meter_news=14815732..................8edbe; pw_cache=0....1480968432.IE.0.0...0xd12fffc3543.........6bb793bc2d38; IT_UUID=69164............b0758e
DNT: 1
Connection: keep-alive

Will try & work out the POST for it.

Leo

leo738 · 12-07-2016, 07:24 AM

Managed to get something going:

Code:

__license__  = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Tom Scholl"
'''
irishtimes.com
'''
import urlparse, re
import json
from mechanize import Request

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile

USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0'

class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns, Tom Scholl"
    description = 'Daily news from The Irish Times'
    needs_subscription = True

    language = 'en_IE'

    masthead_url = 'http://www.irishtimes.com/assets/images/generic/website/logo_theirishtimes.png'

    encoding = 'utf-8'
    oldest_article = 1.0
    max_articles_per_feed = 100
    simultaneous_downloads = 5
    remove_empty_feeds = True
    no_stylesheets = True
    temp_files = []
    articles_are_obfuscated = True

    feeds          = [
                      ('News', 'https://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('World', 'https://www.irishtimes.com/cmlink/irishtimesworldfeed-1.1321046'),
                      ('Politics', 'https://www.irishtimes.com/cmlink/irish-times-politics-rss-1.1315953'),
                      ('Business', 'https://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'),
                      ('Culture', 'https://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'),
# Not interested in sport so commented out..                     
#		  ('Sport', 'https://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'),
                      ('Debate', 'https://www.irishtimes.com/cmlink/debate-1.1319211'),
                      ('Life & Style', 'https://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'),
                    ]

    def get_browser(self):
        # To understand the signin logic read signin javascript from submit button from
        # https://www.irishtimes.com/signin

        br = BasicNewsRecipe.get_browser(self, user_agent=USER_AGENT)

        url = 'https://www.irishtimes.com/signin'
        br.set_debug_http(True)
        br.open(url).read()
        rurl = 'https://www.irishtimes.com/auth-rest-api/v1/paywall/login'
        rq = Request(rurl, headers={
            'Accept': '*/*',
            'Accept-Language': 'en-US,en;q=0.5',
            'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'Referer': url,
            'X-Requested-With': 'XMLHttpRequest',
        }, data=json.dumps({
            'username': self.username,
            'password': self.password,
            'deviceid': '53c835787f4d2406131985553c1842d0',
            'persistent': 'on',
        }))
        r = br.open(rq)
        if r.code != 200:
            raise ValueError('Failed to login, check username and password')
        data = json.loads(r.read())
        print(data)
        #if data.get('result') != 'success':
        #    raise ValueError(
        #        'Failed to login (XHR failed), check username and password')
        #br.set_cookie('m', data['username'], '.wsj.com')
        #r = br.open(data['url'])
        #self.wsj_itp_page = raw = r.read()
        #if b'>Sign Out<' not in raw:
        #    raise ValueError(
        #        'Failed to login (auth URL failed), check username and password')
        # open('/t/raw.html', 'w').write(raw)
        return br

    def get_obfuscated_article(self, url):
        # Insert a pic from the original url, but use content from the print url
        pic = None
        pics = self.index_to_soup(url)
        div = pics.find('div', {'class' : re.compile('image-carousel')})
        if div:
            pic = div.img
            if pic:
                try:
                    pic['src'] = urlparse.urljoin(url, pic['src'])
                    pic.extract()
                except:
                    pic = None

        content = self.index_to_soup(url + '?mode=print&ot=example.AjaxPageLayout.ot')
        if pic:
            content.p.insert(0, pic)

        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(content.prettify())
        self.temp_files[-1].close()
        return self.temp_files[-1].name

But the json stuff contains a 'deviceid' which I don't seem to be able to find much stuff on.

Any pointers what it is??

Thanks,

Leo

kovidgoyal · 12-07-2016, 07:32 AM

It's likely an id that is generated using browser fingerprinting and helps track users. You can probably just use a random string for it in the same format as you you got for your browser.

leo738 · 12-09-2016, 07:30 AM

Thanks for the reply but not getting very far on this..

On hitting the 'sigin' button the following POST is sent to:

https://www.irishtimes.com/auth-rest-api/v1/paywall/login

Code:

Host: www.irishtimes.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: https://www.irishtimes.com/signin
Content-Length: 106
Cookie: IT_UUID=1150a714-be0a-11e6-b6a8-005056b0758e; IT_cookiepopup=1
DNT: 1
Connection: keep-alive

with the request body:

Code:

username=ABCDEF%40gmail.com&password=123456&deviceid=53c835787f4d2406131985633c1942d0&persistent=on

So I came up with the following in the recipe (based on WSJ code & only including the login stuff):

Code:

def get_browser(self):
        # To understand the signin logic read signin javascript from submit button from
        # https://www.irishtimes.com/signin

        br = BasicNewsRecipe.get_browser(self, user_agent=USER_AGENT)

        url = 'https://www.irishtimes.com/signin'
        br.set_debug_http(True)
        br.open(url).read()
        rurl = 'https://www.irishtimes.com/auth-rest-api/v1/paywall/login'
        rq = Request(rurl, headers={
            'Accept': '*/*',
            'Accept-Language': 'en-US,en;q=0.5',
            'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'Referer': url,
            'X-Requested-With': 'XMLHttpRequest',
        }, data=json.dumps({
            'username': self.username,
            'password': self.password,
            'deviceid': '53c835787f4d2406131985633c1842d0',
            'persistent': 'on',
        }))
        r = br.open(rq)
        if r.code != 200:
            raise ValueError('Failed to login, check username and password')
        data = json.loads(r.read())
        print(data)
        #if data.get('result') != 'success':
        #    raise ValueError(
        #        'Failed to login (XHR failed), check username and password')
        #br.set_cookie('m', data['username'], '.wsj.com')
        #r = br.open(data['url'])
        #self.wsj_itp_page = raw = r.read()
        #if b'>Sign Out<' not in raw:
        #    raise ValueError(
        #        'Failed to login (auth URL failed), check username and password')
        # open('/t/raw.html', 'w').write(raw)
        return br

However the response I get is:

Code:

send: 'GET /signin HTTP/1.1\r\nAccept-Encoding: identity\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0\r\nHost: www.irishtimes.com\r\nAccept: */*\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: Apache-Coyote/1.1
header: Content-Type: text/html;charset=utf-8
header: Last-Modified: Fri, 09 Dec 2016 12:26:34 GMT
header: X-Cacheable: YES
header: Content-Length: 72338
header: Accept-Ranges: bytes
header: Date: Fri, 09 Dec 2016 12:27:43 GMT
header: Connection: keep-alive
header: X-Pw-Hits: 1
header: Set-Cookie: IT_UUID=e23fb6da-be0a-11e6-bd74-005056a02a54; domain=.irishtimes.com; expires=Thu, 01 Jan 2099 00:00:01 GMT; path=/;
header: Pragma: no-cache
header: Cache-Control: no-cache, no-store, must-revalidate
header: Expires: Thu, 1 Jan 1970 00:00:00 GMT
send: 'POST /auth-rest-api/v1/paywall/login HTTP/1.1\r\nAccept-Encoding: identity\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0\r\nContent-Length: 126\r\nReferer: https://www.irishtimes.com/signin\r\nConnection: close\r\nX-Requested-With: XMLHttpRequest\r\nAccept: */*\r\nHost: www.irishtimes.com\r\nContent-Type: application/x-www-form-urlencoded; charset=UTF-8\r\nCookie: IT_UUID=e23fb6da-be0a-11e6-bd74-005056a02a54\r\nAccept-Language: en-US,en;q=0.5\r\n\r\n{"password": "123456", "deviceid": "53c835787f4d2406131955633c1842d0", "username": "ABCDEF@gmail.com", "persistent": "on"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: Apache/2.4.10 (Debian)
header: Cache-Control: max-age=300
header: Expires: Fri, 09 Dec 2016 12:32:43 GMT
header: Content-Type: application/json
header: Last-Modified: Fri, 09 Dec 2016 12:27:43 GMT
header: Content-Length: 51
header: Accept-Ranges: bytes
header: Date: Fri, 09 Dec 2016 12:27:43 GMT
header: Connection: keep-alive
header: X-Pw-Hits: 0
<response_seek_wrapper at 0x7f650b587f80 whose wrapped object = <closeable_response at 0x7f650b50c638 whose fp = <socket._fileobject object at 0x7f650e597cd0>>>
{u'error_number': u'1', u'error_message': u'Login failed'}

Any pointers??

Thanks,

Leo

leo738 · 12-09-2016, 07:35 AM

Just noticed that the POST from the Irish Times is using:

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

Whereas the WSJ uses:

Content-Type: application/json

So looks like I shouldn't be using json stuff!

How do I add it instead??

leo738 · 12-10-2016, 02:56 PM

Found an example of a similar login (available on github repo):

calibre/recipes/hbr.recipe

Code:

  rq = Request(rurl, headers={
            'Accept': '*/*',
            'Accept-Language': 'en-US,en;q=0.5',
            'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'Referer': url,
            'X-Requested-With': 'XMLHttpRequest',
        },  data=urlencode({'username': self.username, 'password': self.password,'deviceid':deviceid, 'persistent':'on'}))

Looks like it's working now, will get it tidied up & then submit it.

Regards,

Leo

leo738 · 12-11-2016, 03:43 PM

I've put together an improved recipe but still having issues. It successful handles the sigin however when it starts downloading the articles (via RSS) it returns:

Code:

header: X-Pw-Access: anonymous,subscribers.p_1_2901997.news.1..aac.1.1.5

Not idea how to proceed from here!

I attach the recipe for anyone interested.

Leo

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
The Irish Times - Paywall erected	leo738	Recipes	2	07-10-2016 03:04 AM
Updated Irish Times recipe?	leo738	Recipes	10	04-01-2013 08:13 AM
Irish Times - Recipe Problem	leo738	Recipes	10	08-31-2011 12:15 PM
Irish Times Recipe problem	mbro	Recipes	3	04-16-2011 08:11 AM
Modified Irish Times Recipe	phiznlil	Recipes	2	04-01-2011 06:27 AM

07-16-2016, 07:45 AM	#3
Aimylios Member Posts: 16 Karma: 10 Join Date: Apr 2016 Device: Tolino Vision 3HD	Yes, I think that should select the right form (the first one). Although you could also try this command if you are in doubt: Code: br.select_form(nr=0) I just had a brief look at the source code of the page and didn't try it out, but I think the string "Invalid email or password" is always included (even if it is not shown). You should remove that to see what happens or find another way to check the login status. Code: if 'Invalid email or password' in raw: raise Exception('Your username and password are incorrect')

07-16-2016, 03:35 PM	#4
leo738 Enthusiast Posts: 39 Karma: 10 Join Date: Jul 2011 Device: Kindle 3	Hello, Many thanks for your reply! You're correct, that 'Your username and password are incorrect' is present in the page before the submit button is pushed so I edited that section out. Is there a simple way to verify it properly? As suggested I added the snippet of code for the form & verified that the correct form was selected (by printing it to screen). It outputted: Code: <POST https://www.irishtimes.com/signin# application/x-www-form-urlencoded <TextControl(email=)> <PasswordControl(password=)> <SubmitButtonControl(<None>=) (readonly)>> However it's still failing. Almost all the articles after the first 5 or so have lots of stuff about signing in. Do I need to handle importing text from the command line argument? I haven’t added anything in that regard. Anything else you can think of? Thanks again for looking, Leo

07-18-2016, 03:09 PM	#5
leo738 Enthusiast Posts: 39 Karma: 10 Join Date: Jul 2011 Device: Kindle 3	Not an issue importing password, or username from the command line

12-03-2016, 04:14 PM	#7
leo738 Enthusiast Posts: 39 Karma: 10 Join Date: Jul 2011 Device: Kindle 3	Just getting back to this after a break. It looks like some issue around the submit button. I've read up a little on the br.submit() command. Could it be that some javascript is needs to be executed to verify the login details after the button press which mechanize is unable to handle? Should I try use use POST instead? Any help appreciated. Leo

12-03-2016, 10:23 PM	#8
kovidgoyal creator of calibre Posts: 43,856 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Yes, generally when a plain submit() does not work, it means there is javascript behind the scenes. WHat you do then is use the developer tools in a regular browser to see the requests generated by the login page when you click submit and clone them in the recipe. An example of doing that is in the WSJ recipe.

12-07-2016, 07:32 AM	#11
kovidgoyal creator of calibre Posts: 43,856 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	It's likely an id that is generated using browser fingerprinting and helps track users. You can probably just use a random string for it in the same format as you you got for your browser.

12-09-2016, 07:35 AM	#13
leo738 Enthusiast Posts: 39 Karma: 10 Join Date: Jul 2011 Device: Kindle 3	Just noticed that the POST from the Irish Times is using: Content-Type: application/x-www-form-urlencoded; charset=UTF-8 Whereas the WSJ uses: Content-Type: application/json So looks like I shouldn't be using json stuff! How do I add it instead??

Advert

Advert