02-27-2018, 11:13 AM | #1 |
Member
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
|
Perfecting the New York Times recipe
The NYT recipe works better than ever! Here are a few observations that might help to improve it even more:
Thanks for the continued progress! Dan Last edited by danhotchkiss; 03-06-2018 at 09:51 AM. |
02-27-2018, 12:39 PM | #2 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
1. They already have different descriptions. If you thin the descriptions can be improved, feel free to suggest improvements.
2. Sure, already done. 3. That will be because the NYT servers return something different for those articles, either because they are flaky or some sort of CAPTCHA/bot detection algorithm that fires occassionally or similar. SOmebody with more time/interest than me in the NYT will have to investigate. |
03-02-2018, 09:54 AM | #3 |
Member
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
|
1. All I can suggest is that the description might say more about the difference between the Web and non-Web recipes, to help users know which to choose.
2. Thanks! 3. I hope someone will accept your challenge. It's frustrating to miss articles without knowing why. Dan |
03-07-2018, 06:08 PM | #4 |
Member
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
|
I find that issues #2 and 3 above are both resolved in the current non-web version of the recipe. Thanks!
|
03-08-2018, 02:28 AM | #5 | |
Member
Posts: 16
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
Quote:
Also - I thought there used to be a "front page" section but I don't see that lately edit: missing on Sundays only? edit: nm - the web edition doesn't have it but other one does I think Last edited by BillD; 03-09-2018 at 04:47 AM. |
|
03-08-2018, 08:09 AM | #6 |
Member
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
|
I spoke too soon. The "missing article" issue persists in the current version of the recipe.
An interesting detail that may be diagnostic : the articles are missing, but "This article was downloaded by calibre from" [URL] appears anyway. |
03-14-2018, 11:09 AM | #7 |
Big Poppa
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
|
Same here. It seems to be a case of the page being denied on the server from viewing via paywall.
|
03-14-2018, 11:34 AM | #8 |
Big Poppa
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
|
Kovid, here's the response for a page that was requested but is skipped in the results. Each run has a few missing, not the same one.
reply: 'HTTP/1.1 200 OK\r\n' header: Server: nginx header: Content-Type: text/html; charset=utf-8 header: X-PageType: vi-story header: X-VI-Compatibility: Compatible header: Alt-Svc: clear header: X-Origin-Time: 2018-03-14 11:14:23 EDT header: Fastly-Restarts: 1 header: Content-Length: 216267 header: Accept-Ranges: bytes header: Date: Wed, 14 Mar 2018 15:14:23 GMT header: Age: 0 header: X-Frame-Options: DENY header: Set-Cookie: vi_www_hp=z00; path=/; domain=.nytimes.com; expires=Wed, 01 Jan 2020 00:00:00 GMT header: Set-Cookie: nyt-a=xxxxxxxxxxxxx; Expires=Thu, 14 Mar 2019 15:14:23 GMT; Path=/; Domain=.nytimes.com header: Connection: close header: X-API-Version: F-VI header: Content-Security-Policy: default-src data: 'unsafe-inline' 'unsafe-eval' https:; script-src data: 'unsafe-inlin$header: X-Served-By: cache-lcy19238-LCY header: X-Cache: MISS header: X-Cache-Hits: 0 header: X-Timer: S1521040463.958698,VS0,VE458 header: Vary: Accept-Encoding, Fastly-SSL The next article looks like this by comparison: reply: 'HTTP/1.1 200 OK\r\n' header: Server: nginx header: Content-Type: text/html; charset=UTF-8 header: X-App-Name: article header: Cache-Control: no-cache header: X-ESI: 1 header: X-App-Response-Time: 1.77 header: X-XSS-Protection: 1; mode=block header: Alt-Svc: clear header: X-Origin-Time: 2018-03-14 11:14:24 EDT header: Content-Length: 97530 header: Accept-Ranges: bytes header: Date: Wed, 14 Mar 2018 15:14:25 GMT header: Age: 0 header: X-Frame-Options: DENY header: Set-Cookie: nyt-a=xxxxxxxxxxxxxxxxxxxxxxx; Expires=Thu, 14 Mar 2019 15:14:25 GMT; Path=/; Domain=.nytimes.com header: Connection: close header: X-API-Version: F-GA header: X-PageType: article header: Content-Security-Policy: default-src data: 'unsafe-inline' 'unsafe-eval' https:; script-src data: 'unsafe-inlin$ header: X-Served-By: cache-lcy19233-LCY header: X-Cache: MISS header: X-Cache-Hits: 0 header: X-Timer: S1521040464.699110,VS0,VE1333 header: Vary: Accept-Encoding, Fastly-SSL Last edited by bobbysteel; 03-14-2018 at 11:48 AM. |
03-14-2018, 11:37 AM | #9 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If so, there is not much we can do about it, since the NYT requires a captcha to login, so we cannot log in in the recipe. You could try using delay = 1 which might avoid it (though it will make downloads very slow). Or if you want to get more sophisticated you can detect the paywall markup in preprocess_raw_html() and re-request the article.
Last edited by kovidgoyal; 03-14-2018 at 11:41 AM. |
03-15-2018, 12:40 AM | #10 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I took a quick look and basically the NYT seems to be A/B testing a new page layout, some pages come with layout A and others with layout B. (actually I have seen three different layouts). This shoudl improve the situation: https://github.com/kovidgoyal/calibr...93e8905077fa4b
|
03-15-2018, 08:21 AM | #11 |
Big Poppa
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
|
thank you Kovid!
|
03-17-2018, 12:16 PM | #12 |
Member
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
|
The New York Times recipe is now officially perfect! (At least till the site changes again.) Thanks to all who helped, and to Kovid most of all.
|
03-25-2018, 02:40 PM | #13 |
Junior Member
Posts: 5
Karma: 10
Join Date: Mar 2012
Device: Kobo Aura H2O2, Kobo Aura
|
Thank you, Kovid et al.! I didn't realize there was the web and non-web versions. Very happy to have the front page back as the cover image. Also, the link-only pages seem to be diminished, and were rarely articles I really wanted to see anyway.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
International New York Times Recipe | sansay | Recipes | 0 | 02-24-2014 01:05 AM |
New York Times Recipe | dieterpops | Recipes | 1 | 01-20-2013 12:26 PM |
Which New York Times recipe? | jdomingos76 | Recipes | 1 | 03-25-2011 08:40 PM |
Help - New York Times Recipe | brutalist | Recipes | 6 | 03-20-2011 10:17 PM |
New York Times recipe | madrone26 | Calibre | 4 | 04-02-2009 01:13 PM |