I've got a crontab job on an ubuntu server set up to download the nytimes web edition every hour and put it on a web server where I can download it to my kindle.
Almost always (about 20-23 times a day, out of the 24 times it gets run), calibre generates a 404 error and doesn't generate the file. Here's what I'm running, and the output I get:
Code:
$ /u1/myhome/bin/ebook-convert /u1/myhome/nytimes/nytimes_sub.web.recipe /u1/myhome/pub_http_internet/nytimes/nytimes-web-test.mobi --username myuser --password mypass --output-profile kindle
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
Index URL: http://www.nytimes.com/pages/world/index.html
Index URL: http://www.nytimes.com/pages/national/index.html
Index URL: http://www.nytimes.com/pages/politics/index.html
Index URL: http://www.nytimes.com/pages/nyregion/index.html
Index URL: http://www.nytimes.com/pages/business/index.html
Index URL: http://www.nytimes.com/pages/technology/index.html
Index URL: http://www.nytimes.com/pages/science/index.html
Index URL: http://www.nytimes.com/pages/health/index.html
Index URL: http://www.nytimes.com/pages/opinion/index.html
Index URL: http://www.nytimes.com/pages/arts/index.html
Index URL: http://www.nytimes.com/pages/books/index.html
Index URL: http://www.nytimes.com/pages/movies/index.html
Index URL: http://www.nytimes.com/pages/arts/music/index.html
Index URL: http://www.nytimes.com/pages/arts/television/index.html
Index URL: http://www.nytimes.com/pages/dining/index.html
Index URL: http://www.nytimes.com/pages/travel/index.html
Index URL: http://www.nytimes.com/pages/education/index.html
Index URL: http://www.nytimes.com/pages/magazine/index.html
Index URL: http://www.nytimes.com/pages/weekinreview/index.html
Traceback (most recent call last):
File "site.py", line 58, in main
File "site-packages/calibre/ebooks/conversion/cli.py", line 325, in main
File "site-packages/calibre/ebooks/conversion/plumber.py", line 979, in run
File "site-packages/calibre/customize/conversion.py", line 208, in __call__
File "site-packages/calibre/ebooks/conversion/plugins/recipe_input.py", line 105, in convert
File "site-packages/calibre/web/feeds/news.py", line 881, in download
File "site-packages/calibre/web/feeds/news.py", line 1025, in build_index
File "<string>", line 582, in parse_index
File "<string>", line 455, in parse_web_edition
File "<string>", line 379, in index_to_soup
File "<string>", line 362, in get_the_soup
File "site-packages/mechanize/_mechanize.py", line 199, in open_novisit
File "site-packages/mechanize/_mechanize.py", line 255, in _mech_open
httperror_seek_wrapper: HTTP Error 404: Not found
The only diffs between nytimes_sub.web.recipe and the nytimes_sub recipe I downloaded today from launchpad are the following:
Code:
webEdition = False | webEdition = True
oldest_article = 7 | oldest_article = 2
useHighResImages = True | useHighResImages = False
excludeSections = [] | excludeSections = ['Sports']
(u'Sports',u'sports'), <
(u'Style',u'style'), <
(u'Fashion & Style',u'fashion'), <
(u'Home & Garden',u'garden'), <
('Multimedia',u'multimedia'), <
(u'Obituaries',u'obituaries'), <
(for those of you who don't speak diff, all I changed was to turn on the WebEdition, and exclude some sections, old articles and high-res images.)
Has anyone else been experiencing similar problems?