MobileRead Forums - View Single Post - Nytimes web recipe intermittent 404 error

squigish · 09-20-2012, 06:05 PM

I've got a crontab job on an ubuntu server set up to download the nytimes web edition every hour and put it on a web server where I can download it to my kindle.

Almost always (about 20-23 times a day, out of the 24 times it gets run), calibre generates a 404 error and doesn't generate the file. Here's what I'm running, and the output I get:

Code:

$ /u1/myhome/bin/ebook-convert /u1/myhome/nytimes/nytimes_sub.web.recipe /u1/myhome/pub_http_internet/nytimes/nytimes-web-test.mobi --username myuser --password mypass --output-profile kindle
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
Index URL: http://www.nytimes.com/pages/world/index.html
Index URL: http://www.nytimes.com/pages/national/index.html
Index URL: http://www.nytimes.com/pages/politics/index.html
Index URL: http://www.nytimes.com/pages/nyregion/index.html
Index URL: http://www.nytimes.com/pages/business/index.html
Index URL: http://www.nytimes.com/pages/technology/index.html
Index URL: http://www.nytimes.com/pages/science/index.html
Index URL: http://www.nytimes.com/pages/health/index.html
Index URL: http://www.nytimes.com/pages/opinion/index.html
Index URL: http://www.nytimes.com/pages/arts/index.html
Index URL: http://www.nytimes.com/pages/books/index.html
Index URL: http://www.nytimes.com/pages/movies/index.html
Index URL: http://www.nytimes.com/pages/arts/music/index.html
Index URL: http://www.nytimes.com/pages/arts/television/index.html
Index URL: http://www.nytimes.com/pages/dining/index.html
Index URL: http://www.nytimes.com/pages/travel/index.html
Index URL: http://www.nytimes.com/pages/education/index.html
Index URL: http://www.nytimes.com/pages/magazine/index.html
Index URL: http://www.nytimes.com/pages/weekinreview/index.html
Traceback (most recent call last):
  File "site.py", line 58, in main
  File "site-packages/calibre/ebooks/conversion/cli.py", line 325, in main
  File "site-packages/calibre/ebooks/conversion/plumber.py", line 979, in run
  File "site-packages/calibre/customize/conversion.py", line 208, in __call__
  File "site-packages/calibre/ebooks/conversion/plugins/recipe_input.py", line 105, in convert
  File "site-packages/calibre/web/feeds/news.py", line 881, in download
  File "site-packages/calibre/web/feeds/news.py", line 1025, in build_index
  File "<string>", line 582, in parse_index
  File "<string>", line 455, in parse_web_edition
  File "<string>", line 379, in index_to_soup
  File "<string>", line 362, in get_the_soup
  File "site-packages/mechanize/_mechanize.py", line 199, in open_novisit
  File "site-packages/mechanize/_mechanize.py", line 255, in _mech_open
httperror_seek_wrapper: HTTP Error 404: Not found

The only diffs between nytimes_sub.web.recipe and the nytimes_sub recipe I downloaded today from launchpad are the following:

Code:

    webEdition = False					      |	    webEdition = True
    oldest_article = 7					      |	    oldest_article = 2
    useHighResImages = True				      |	    useHighResImages = False
    excludeSections = []				      |	    excludeSections = ['Sports']
                    (u'Sports',u'sports'),		      <
                    (u'Style',u'style'),		      <
                    (u'Fashion & Style',u'fashion'),	      <
                    (u'Home & Garden',u'garden'),	      <
                    ('Multimedia',u'multimedia'),	      <
                    (u'Obituaries',u'obituaries'),	      <

(for those of you who don't speak diff, all I changed was to turn on the WebEdition, and exclude some sections, old articles and high-res images.)

Has anyone else been experiencing similar problems?