![]() |
#1 |
Newsbeamer dev
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 123
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
Recipe request:
Hi all,
Would anybody be able to help me to create a recipe? Ideally, we could create a recipe that parses the current issue of the magazine. You need to log in to see this (but I'll PM the log in details if anyone is willing to help). I'd be very grateful if anyone were willing and able to help! Thanks Jamie Last edited by duluoz; 01-06-2012 at 07:14 PM. |
![]() |
![]() |
![]() |
#2 |
doofus
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,543
Karma: 13088847
Join Date: Sep 2010
Device: Kobo Libra 2, Kindle Voyage
|
I don't have a login so I'm doing this blind. This may even work
![]() edit: there is an error in parse_index. fixed. I'm getting a Forbidden error when fetching the link, however. Maybe if you have a login it would work? Cross fingers. Last edited by Barty; 12-18-2011 at 10:39 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Newsbeamer dev
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 123
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
Thanks for having a go Barty - but I'm afraid it didn't work.
The result was just 3 pages - one with the date, one with the issue number, and one blank. I was wondering whether there was something in parsing links based on the sections in the current issue (Features, Opinion, Science & Tech etc etc). Thanks again Jamie |
![]() |
![]() |
![]() |
#4 |
doofus
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,543
Karma: 13088847
Join Date: Sep 2010
Device: Kobo Libra 2, Kindle Voyage
|
See my edited post above
|
![]() |
![]() |
![]() |
#5 |
Newsbeamer dev
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 123
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
Barty - I appreciate you trying to crack this. Still not working though I'm afraid. It pulls the cover, and the issue number, but no articles. I've posted the pdf it creates if it helps.
https://docs.google.com/open?id=0B0O...FiYTJkNGI5NGZm I'll also PM a logon - although I think even without it should work, but would just pull less articles. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
doofus
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,543
Karma: 13088847
Join Date: Sep 2010
Device: Kobo Libra 2, Kindle Voyage
|
Yeah, sorry, I'm stumped. It's parsing the index and getting the title and URL correctly, but fetching the URL gives a forbidden error. Maybe Kovid or someone else can take a look at it.
|
![]() |
![]() |
![]() |
#7 |
Newsbeamer dev
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 123
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
Thanks Barty. I'm looking at the output from running ebook-convert in the command line - as you say, it's correctly finding the article URLs, but then not able to download the articles.
I don't get forbidden errors, just 'Failed to download article [article name]' EDIT: just checked the debug file, and I also get the failed to d/load article error. Wonder if it's a 403 error somewhere?? Seems very strange. Last edited by duluoz; 12-19-2011 at 06:59 PM. |
![]() |
![]() |
![]() |
#8 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That likely means that the login failed. Check the result of the submit() in get_browser() to ensure the login actually worked.
html = br.submit().read() |
![]() |
![]() |
![]() |
#9 |
Newsbeamer dev
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 123
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
Kovid - thanks for the reply, much appreciated. I'm not sure how to use the code snippet you provided though - where can I access the value of 'html'?
And I'm not convinced it's a login question - I thought perhaps something to do with the user agent? You can access the articles it failed to whether logged in or not. I posted this as a new thread, with the specific problem. Perhaps you might have a chanc to take a look at the error message? https://www.mobileread.com/forums/sho...d.php?t=161669 Thanks again |
![]() |
![]() |
![]() |
#10 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The news download system automatically sets the user agent to mimic a browser. The error message is a generic HTTP 403, there's no way to know from it why permission is denied. You can always save the html to a file with
open('path_to_some_file.html', 'wb').write(html) and open it in a browser/text editor later. |
![]() |
![]() |
![]() |
#11 |
doofus
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,543
Karma: 13088847
Join Date: Sep 2010
Device: Kobo Libra 2, Kindle Voyage
|
The login appears correct. Output attached.
However, even if I set need_subscription = False and remove get_browser(), I shouldn't get Forbidden but instead a stub page with a summary of the article and prompt to subscribe or login to see full article. At least that is what I get using a regular browser. So I looked at my browser's network log, and it looks like the page returns 403 even if you have logged in, but the response body contains the full article and looks normal if you are reading with your browser Code:
URL: http://www.prospectmagazine.co.uk/2011/12/time-travel/ Method: GET Status: 403 Forbidden Duration: 1751 ms Request details GET /2011/12/time-travel/ HTTP/1.1 User-Agent: Opera/9.80 (Windows NT 6.1; U; Edition United States Local; en) Presto/2.10.229 Version/11.60 Host: www.prospectmagazine.co.uk Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1 Accept-Language: en-US,en;q Accept-Encoding: gzip, deflate Referer: http://www.prospectmagazine.co.uk/issue/190/ Cookie: wordpress_test_cookie=(snip) Connection: Keep-Alive Request body No request data Response details HTTP/1.1 403 Forbidden Date: Tue, 20 Dec 2011 16:58:08 GMT Server: Apache X-Pingback: http://www.prospectmagazine.co.uk/xmlrpc.php Expires: Wed, 11 Jan 1984 05:00:00 GMT Cache-Control: no-cache, must-revalidate, max-age=0 Pragma: no-cache Link: <http://www.prospectmagazine.co.uk/?p=103609>; rel=shortlink Last-Modified: Tue, 20 Dec 2011 16:58:08 GMT Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html; charset=UTF-8 Body .... full body snipped ... Last edited by Barty; 12-20-2011 at 03:56 PM. |
![]() |
![]() |
![]() |
#12 |
Newsbeamer dev
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 123
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
Thanks both - it all seems pretty strange behaviour. Do you think there's no chance then that we could get a working recipe?
|
![]() |
![]() |
![]() |
#13 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That's just weird. There's no way to have the news download system handle a website that returns incorrect HTTP codes.
|
![]() |
![]() |
![]() |
#14 |
doofus
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,543
Karma: 13088847
Join Date: Sep 2010
Device: Kobo Libra 2, Kindle Voyage
|
@duluoz: I think you can try contacting the site and letting them know they're sending back a 403 Forbidden response when fetching an article.
Maybe they're doing it on purpose, or maybe it's a bug. |
![]() |
![]() |
![]() |
#15 |
Newsbeamer dev
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 123
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
Thanks both - I sent a note to the webmaster, and got a response already.
Last edited by duluoz; 01-06-2012 at 07:10 PM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Request: recipe for German magazine fluter.de | wgdojocihb9 | Recipes | 4 | 06-17-2023 08:42 AM |
American Prospect recipe not working | davidsmartin | Recipes | 7 | 07-20-2012 07:55 PM |
http://www.cfo.com/magazine/ recipe request | jonathan22 | Recipes | 0 | 09-10-2011 02:50 AM |
Recipe request - Macleans Magazine | canislupus | Recipes | 7 | 07-24-2011 08:38 AM |
Recipe Request for World Magazine | fbrian | Recipes | 3 | 06-05-2011 10:10 AM |