Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-05-2012, 01:05 PM   #1
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
New York Times recipe update

Changes to nytimes recipe:
  1. Fix 404 error and crash for non-existent index pages (Web edition). Non-existent sections are silently ignored.
  2. Fix crash when articles preceded by ad pages (all editions). A five second delay is inserted before trying to re-serve an article that served an ad page, otherwise the ad is frequently served again.

    The handling of the ad has been moved to preprocess_html since skip_ad_pages as implemented in the recipe didn't work (failing with an obscure xml decoding crash) and probably never did work.

    Note: there is still an intermittent problem with this in that sometimes a fragment of the ad page appears as the article, and the article itself is loaded as an inline link from the ad page. I'll work on this as time permits but in the mean time, as long as recursions=1, you will get the article (it will follow the ad fragment).
  3. Include tech blog articles (all editions, turn this off using getTechBlogs=False)
  4. Include related articles and inline links to NYTimes articles (all editions, turn this off using recursions=0)
  5. Screen article age via url instead of downloading article and looking at dateline (Web edition, ignore article age by setting oldest_web_article=None). This speeds up the web edition recipe a lot since it no longer has to download articles that are too old to discover they are too old.
  6. Remove login requirement, it is no longer necessary (all editions)
Customization Notes:
  1. The standard recipe is Today's Paper.
  2. For the Today's Headlines issue, set headlinesOnly=True
  3. For the Web version, set webEdition=True and set oldest_web_article to the oldest article (in days) you want to download. If you set oldest_web_article=None you will get everything, otherwise set it to number (e.g., 7 for a week, 1 for yesterday and today).
  4. The technology blogs are attached to each version unless you set getTechBlogs=False. You can control the oldest article (tech_oldest_article)and maximum number of articles per feed (tech_max_articles_per_feed).
Recipe performance:

Here are typical file sizes for various recipe options. Run time is proportional, so for example the Web version with all articles downloaded can take several hours.

Headlines only: 6MB
Today's Paper: 9MB
Web, 1 day: 14MB
Web, 7 day: 27MB
Web, all: 40MB
Attached Files
File Type: zip nytimes.zip (11.7 KB, 319 views)
nickredding is offline   Reply With Quote
Old 12-06-2012, 09:12 AM   #2
BobbyVan
Enthusiast
BobbyVan began at the beginning.
 
Posts: 42
Karma: 20
Join Date: Jan 2012
Device: Kindle Paperwhite
Works great!

Gave this a shot this morning and it works beautifully. Thanks also for stripping the mostly unnecessary "multimedia" images from the articles.

Curious about the file size though. The non-Sunday paper seems about 2x-3x larger than the one created by the earlier recipe. So far it's working fine, but slightly concerned that the Sunday edition will choke my Kindle PW (as has happened in the past with the Sunday NYT).
BobbyVan is offline   Reply With Quote
Old 12-06-2012, 02:45 PM   #3
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by BobbyVan View Post
Curious about the file size though. The non-Sunday paper seems about 2x-3x larger than the one created by the earlier recipe.
The increased file size is because of the inline links and related articles processing. If you look at the NYT on the web you will see a lot of stories have a sidebar called "Related" or similar, with links to related articles. Articles also have inline links to other, related articles. The new recipe collects these related and inline linked articles and downloads them behind the main article. When you are reading the main article on your device, if you click/tap on an inline link that was downloaded you will land on that article. Similarly, you will find a list of links under the title "Related" at the end of the main article, which are the articles from the "Related" sidebar. You can click/tap on these to proceed to the corresponding article.

The related articles and inline links are only processed for top-level articles. So if an inline linked or related article has inline links or related articles, they are not processed (otherwise it could go on forever).

Of course, the inline linked and related articles increase the file size. You can prevent the inline links and related articles from being processed and downloaded by setting recursions=0.

Note however that setting recursions=0 may prevent some articles that are preceded by an ad from being included. As I noted in the original message, there is still an intermittent problem with these articles, where the ad sometimes sneaks in and the article is downloaded as a subsidiary link. Setting recursions=0 would stop the subsidiary link from being downloaded.

One final note: I regularly feed my Kindle Keyboard (K3) 25MB news download files with no problems, although I suppose the PW could be more limited. Note that you can use the includeSections and excludeSections variables to control what sections are processed, so for example if you don't care for the Sports section you could set excludeSections=['Sports'] and bypass all of that content.
nickredding is offline   Reply With Quote
Old 12-07-2012, 01:58 PM   #4
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Note: there is still an intermittent problem with this in that sometimes a fragment of the ad page appears as the article, and the article itself is loaded as an inline link from the ad page.
I have concluded that this occurs when the NYTimes website stubbornly serves the ad page again even after a 5 second delay and a page reload.

Interestingly, this only seems to happen when there are multiple downloads taking place simultaneously.

The recipe I submitted has simultaneous_downloads=1 and this seems to prevent the problem from arising.

If anyone encounters the ads slipping in with simultaneous_downloads=1 please let me know. Otherwise I'll assume the slight increase in runtime from using simultaneous_downloads=1 is a reasonable solution to the problem.
nickredding is offline   Reply With Quote
Old 12-29-2012, 05:01 PM   #5
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
simultaneous_download>1 now OK

Quote:
Originally Posted by nickredding View Post
I have concluded that this occurs when the NYTimes website stubbornly serves the ad page again even after a 5 second delay and a page reload.

Interestingly, this only seems to happen when there are multiple downloads taking place simultaneously.

The recipe I submitted has simultaneous_downloads=1 and this seems to prevent the problem from arising.
I have resolved this and attached an updated recipe with simultaneous_downloads=1 commented out (leaving caliber to apply the default number of concurrent downloads). It turns out that the "skip this ad" link contains some URL magic that prevents the ad from being re-served, so I have left this link intact (the previous recipe was removing it).
Attached Files
File Type: zip nytimes.zip (11.5 KB, 202 views)
nickredding is offline   Reply With Quote
Old 12-31-2012, 04:37 PM   #6
Gyl
Junior Member
Gyl began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2012
Device: nook hd+
Is it possible to add the date in the Titlename like:
New York Times [Mo, 31 Dez 2012]

Thanx in advance
Gyl is offline   Reply With Quote
Old 12-31-2012, 05:13 PM   #7
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by Gyl View Post
Is it possible to add the date in the Titlename like:
New York Times [Mo, 31 Dez 2012]

Thanx in advance
You can customize the recipe to use your title format by setting

title = 'New York Times'+strftime(' [%a, %d %b %Y]')

Put it in right before decode_url_date but AFTER the other tests that are setting title according to other parameters.

I think a lot of people like the standard recipe to not include the date in the title because that way different issues of the same publication get stacked on their e-readers instead of appearing as distinct documents.
nickredding is offline   Reply With Quote
Old 01-12-2014, 12:27 AM   #8
benong
Enthusiast
benong began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Aug 2007
Location: Petaling Jaya, Malaysia
Device: Kindle Fire HD 8.9, Kobo Aura HD, Sony PRS-950
I noticed that nytimes has changed the web layouts recently and because of this, the current caliber nytimes recipe seems to not work as nicely as before. Besides a great increase in file size, every article now precedes with "1. Loading ..... " etc

Would appreciate if the original authors can look into this.
NYTimes recipe has been great and I enjoy the loaded articles tremendously.
benong is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
FIX: New York Times Recipe bcollier Recipes 2 08-25-2011 11:31 AM
Which New York Times recipe? jdomingos76 Recipes 1 03-25-2011 08:40 PM
Help - New York Times Recipe brutalist Recipes 6 03-20-2011 10:17 PM
Updated New York Times recipe nickredding Recipes 2 11-20-2010 10:53 AM
New York Times recipe madrone26 Calibre 4 04-02-2009 01:13 PM


All times are GMT -4. The time now is 01:58 AM.


MobileRead.com is a privately owned, operated and funded community.