02-07-2018, 08:37 AM | #1 |
Enthusiast
Posts: 31
Karma: 32
Join Date: Jan 2012
Device: Kindle Paperwhite
|
New York Times recipe broken
Looks like the "Today's Paper" webpage moved from
http://www.nytimes.com/pages/todayspaper/index.html to https://www.nytimes.com/section/todayspaper And the webpage's layout is different as well. I don't think I can fix it, but if anyone here knows how, the Github URL for the recipe is https://github.com/kovidgoyal/calibr...nytimes.recipe . |
02-07-2018, 11:02 AM | #2 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The existing recipes were such a mess, that I just ended up re-writing them. Note that, I am not a NYT reader, so let me know if there are problems in the new recipes. https://github.com/kovidgoyal/calibr...451c71cd94b3b8
Decreased the size of the recipes from 1500 lines to 150 lines. |
Advert | |
|
02-09-2018, 07:45 AM | #3 |
Enthusiast
Posts: 31
Karma: 32
Join Date: Jan 2012
Device: Kindle Paperwhite
|
Thank you, this recipe works very well! It's fantastic that it could be rewritten in a fraction of the code.
There are a couple differences from before, but these are cosmetic and non-essential: -There's some cruft at the end of each article. It starts with "A version of this article appears in print on..." The old script did not have this. However, this is pretty easy to ignore. -The article text contains the same hyperlinks from the website -- when tapped, accidentally or not, on a Kindle, they open up the slow-as-molasses Kindle browser. The old script seemed to erase the hyperlinks, which I never found useful (can't speak for others though). Again, non-essential and easy to ignore. -The resulting files seem larger than before (10 MB vs 3-4 MB for a weekday paper, 75 minutes vs 15 minutes to process on a Raspberry Pi 2, both with the setting compress_news_images_auto_size = 16). I will tool around with compress_news_images_max_size to see if I can get this back down to the same file size / processing time as before. Thank you again! |
02-09-2018, 09:02 AM | #4 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
1) Easily fixed: https://github.com/kovidgoyal/calibr...8bff364d981ac3
2) Personally, I like to keep the links, as people that read them on devices with useful browsers might like to click them once in a while, plus it is good for the NYT to get visits. 3) No idea why that might be -- I never used the old recipe, so it's hard to say what's different. |
02-09-2018, 01:18 PM | #5 |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
Just downloaded Calibre 3.17 (64-bit.) I saw in the change log that the New York Times recipe is improved. However, it is no longer an option in the 'schedule news download' option in 'Fetch news.' Where did it go?
Thanks. |
Advert | |
|
02-09-2018, 01:47 PM | #6 |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
Found it by searching but not sure what section of the 'schedule news download' operation in Calibre it's now in...
|
02-09-2018, 03:29 PM | #7 |
Enthusiast
Posts: 31
Karma: 32
Join Date: Jan 2012
Device: Kindle Paperwhite
|
1. Thanks for the fix!
2. Makes sense. 3. This was embarrassingly user error, I forgot to change the webedition from true to false. I'm used to using false and forgotten I'd changed it awhile ago. |
02-09-2018, 08:27 PM | #8 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@NSILMike: it is in the English section, where I think it always was.
|
02-09-2018, 08:34 PM | #9 |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
|
02-09-2018, 08:44 PM | #10 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It is for me:
|
02-09-2018, 08:58 PM | #11 |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
|
02-10-2018, 06:17 AM | #12 |
Enthusiast
Posts: 31
Karma: 32
Join Date: Jan 2012
Device: Kindle Paperwhite
|
Sorry to keep posting, but the non web_edition scraping mechanism isn't reading the today's edition webpage correctly -- it correctly puts the first four articles in the "Front Page" section, but then it seems to skip over the rest of the "Front Page" section and puts all of the rest of the articles into the "International" section.
I'm not sure what it is in the html that is confusing the script in between the top four articles and the rest -- they're obviously formatted different visually but there's no h1 section between Front Page and International that the script is reading. I don't know Python but I've been staring at it for a little while trying to figure it out... Perhaps it's something about that "rank-template featured-rank-template template-2 issue-template" div that contains only the first four "Front Page" articles that's messing it up. Sorry I can't be more helpful. |
02-10-2018, 12:02 PM | #13 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
02-11-2018, 07:04 AM | #14 |
Enthusiast
Posts: 31
Karma: 32
Join Date: Jan 2012
Device: Kindle Paperwhite
|
Thank you -- this fix captures all of the Front Page articles correctly, but everything that comes after the Front Page section (National, Obituaries, New York, etc.) is still gathered into the International section, which seems to cut off after 90-100 articles (maybe a device or file limitation).
More properly, looking at the log file, it seems that the script picks up every article after the Front Page as belonging in every individual section, and then since those sections are ostensibly the same, retains only the first section (International). https://pastebin.com/Vr6SYq8K Someday I'd like to learn some Python so that I'm submitting pull requests and not just requests... |
02-11-2018, 09:46 AM | #15 | |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
https://github.com/kovidgoyal/calibr...067559c5615e48 |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
New York Times Technology Beat is broken | NSILMike | Recipes | 1 | 04-16-2017 12:55 AM |
New York Times Book Review broken again. | wingmongyee | Recipes | 9 | 03-24-2016 07:20 PM |
New York Times Book Review broken | wingmongyee | Recipes | 3 | 01-02-2016 12:32 AM |
New York Times Recipe | dieterpops | Recipes | 1 | 01-20-2013 12:26 PM |
New York Times recipe broken? | gianfri | Calibre | 1 | 03-20-2010 09:52 AM |