Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-27-2018, 11:13 AM   #1
danhotchkiss
Member
danhotchkiss began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
Perfecting the New York Times recipe

The NYT recipe works better than ever! Here are a few observations that might help to improve it even more:
  • There are two standard recipes, both called "The New York Times". One is Web Version, the other not. It might be helpful to give them different names in the list, and explain the difference. I use the non-web version, which gives me much smaller files.
  • Bylines run into the dates, giving Times reporters interesting names like O'LOUGHLINFEB. Surely there's a way to insert a space.
  • Some articles appear in the section table of contents, but consist entirely of a URL, with no headline, byline, or content. Example from today's paper: "Rewrite Iran Deal? Europeans Offer a Different Solution: A New Chapter". I don't see any obvious reason why this article, and not others, would be blank.

Thanks for the continued progress!

Dan

Last edited by danhotchkiss; 03-06-2018 at 09:51 AM.
danhotchkiss is offline   Reply With Quote
Old 02-27-2018, 12:39 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
1. They already have different descriptions. If you thin the descriptions can be improved, feel free to suggest improvements.

2. Sure, already done.

3. That will be because the NYT servers return something different for those articles, either because they are flaky or some sort of CAPTCHA/bot detection algorithm that fires occassionally or similar. SOmebody with more time/interest than me in the NYT will have to investigate.
kovidgoyal is offline   Reply With Quote
Old 03-02-2018, 09:54 AM   #3
danhotchkiss
Member
danhotchkiss began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
1. All I can suggest is that the description might say more about the difference between the Web and non-Web recipes, to help users know which to choose.

2. Thanks!

3. I hope someone will accept your challenge. It's frustrating to miss articles without knowing why.

Dan
danhotchkiss is offline   Reply With Quote
Old 03-07-2018, 06:08 PM   #4
danhotchkiss
Member
danhotchkiss began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
I find that issues #2 and 3 above are both resolved in the current non-web version of the recipe. Thanks!
danhotchkiss is offline   Reply With Quote
Old 03-08-2018, 02:28 AM   #5
BillD
Member
BillD began at the beginning.
 
BillD's Avatar
 
Posts: 16
Karma: 10
Join Date: Sep 2010
Device: Kindle
Quote:
Originally Posted by danhotchkiss View Post
I find that issues #2 and 3 above are both resolved in the current non-web version of the recipe. Thanks!
I am using non-web NYT version (only do this on Sundays). I also find some articles appear in Table of Contents, but can't find the actual article downloaded. Sort of seems random too.

Also - I thought there used to be a "front page" section but I don't see that lately edit: missing on Sundays only? edit: nm - the web edition doesn't have it but other one does I think

Last edited by BillD; 03-09-2018 at 04:47 AM.
BillD is offline   Reply With Quote
Old 03-08-2018, 08:09 AM   #6
danhotchkiss
Member
danhotchkiss began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
I spoke too soon. The "missing article" issue persists in the current version of the recipe.

An interesting detail that may be diagnostic : the articles are missing, but "This article was downloaded by calibre from" [URL] appears anyway.
danhotchkiss is offline   Reply With Quote
Old 03-14-2018, 11:09 AM   #7
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
Same here. It seems to be a case of the page being denied on the server from viewing via paywall.
bobbysteel is offline   Reply With Quote
Old 03-14-2018, 11:34 AM   #8
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
Kovid, here's the response for a page that was requested but is skipped in the results. Each run has a few missing, not the same one.

reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
header: Content-Type: text/html; charset=utf-8
header: X-PageType: vi-story
header: X-VI-Compatibility: Compatible
header: Alt-Svc: clear
header: X-Origin-Time: 2018-03-14 11:14:23 EDT
header: Fastly-Restarts: 1
header: Content-Length: 216267
header: Accept-Ranges: bytes
header: Date: Wed, 14 Mar 2018 15:14:23 GMT
header: Age: 0
header: X-Frame-Options: DENY
header: Set-Cookie: vi_www_hp=z00; path=/; domain=.nytimes.com; expires=Wed, 01 Jan 2020 00:00:00 GMT
header: Set-Cookie: nyt-a=xxxxxxxxxxxxx; Expires=Thu, 14 Mar 2019 15:14:23 GMT; Path=/; Domain=.nytimes.com
header: Connection: close
header: X-API-Version: F-VI
header: Content-Security-Policy: default-src data: 'unsafe-inline' 'unsafe-eval' https:; script-src data: 'unsafe-inlin$header: X-Served-By: cache-lcy19238-LCY
header: X-Cache: MISS
header: X-Cache-Hits: 0
header: X-Timer: S1521040463.958698,VS0,VE458
header: Vary: Accept-Encoding, Fastly-SSL

The next article looks like this by comparison:
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
header: Content-Type: text/html; charset=UTF-8
header: X-App-Name: article
header: Cache-Control: no-cache
header: X-ESI: 1
header: X-App-Response-Time: 1.77
header: X-XSS-Protection: 1; mode=block
header: Alt-Svc: clear
header: X-Origin-Time: 2018-03-14 11:14:24 EDT
header: Content-Length: 97530
header: Accept-Ranges: bytes
header: Date: Wed, 14 Mar 2018 15:14:25 GMT
header: Age: 0
header: X-Frame-Options: DENY
header: Set-Cookie: nyt-a=xxxxxxxxxxxxxxxxxxxxxxx; Expires=Thu, 14 Mar 2019 15:14:25 GMT; Path=/; Domain=.nytimes.com
header: Connection: close
header: X-API-Version: F-GA
header: X-PageType: article
header: Content-Security-Policy: default-src data: 'unsafe-inline' 'unsafe-eval' https:; script-src data: 'unsafe-inlin$
header: X-Served-By: cache-lcy19233-LCY
header: X-Cache: MISS
header: X-Cache-Hits: 0
header: X-Timer: S1521040464.699110,VS0,VE1333
header: Vary: Accept-Encoding, Fastly-SSL

Last edited by bobbysteel; 03-14-2018 at 11:48 AM.
bobbysteel is offline   Reply With Quote
Old 03-14-2018, 11:37 AM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by bobbysteel View Post
Same here. It seems to be a case of the page being denied on the server from viewing via paywall.
If so, there is not much we can do about it, since the NYT requires a captcha to login, so we cannot log in in the recipe. You could try using delay = 1 which might avoid it (though it will make downloads very slow). Or if you want to get more sophisticated you can detect the paywall markup in preprocess_raw_html() and re-request the article.

Last edited by kovidgoyal; 03-14-2018 at 11:41 AM.
kovidgoyal is offline   Reply With Quote
Old 03-15-2018, 12:40 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I took a quick look and basically the NYT seems to be A/B testing a new page layout, some pages come with layout A and others with layout B. (actually I have seen three different layouts). This shoudl improve the situation: https://github.com/kovidgoyal/calibr...93e8905077fa4b
kovidgoyal is offline   Reply With Quote
Old 03-15-2018, 08:21 AM   #11
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
thank you Kovid!
bobbysteel is offline   Reply With Quote
Old 03-17-2018, 12:16 PM   #12
danhotchkiss
Member
danhotchkiss began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Aug 2012
Device: Kindle DX & Paperwhite
The New York Times recipe is now officially perfect! (At least till the site changes again.) Thanks to all who helped, and to Kovid most of all.
danhotchkiss is offline   Reply With Quote
Old 03-25-2018, 02:40 PM   #13
EMSBoys
Junior Member
EMSBoys began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2012
Device: Kobo Aura H2O2, Kobo Aura
Thank you, Kovid et al.! I didn't realize there was the web and non-web versions. Very happy to have the front page back as the cover image. Also, the link-only pages seem to be diminished, and were rarely articles I really wanted to see anyway.
EMSBoys is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
International New York Times Recipe sansay Recipes 0 02-24-2014 01:05 AM
New York Times Recipe dieterpops Recipes 1 01-20-2013 12:26 PM
Which New York Times recipe? jdomingos76 Recipes 1 03-25-2011 08:40 PM
Help - New York Times Recipe brutalist Recipes 6 03-20-2011 10:17 PM
New York Times recipe madrone26 Calibre 4 04-02-2009 01:13 PM


All times are GMT -4. The time now is 10:46 AM.


MobileRead.com is a privately owned, operated and funded community.