Failed: Fetch News from The Guardian...

CaliWenger · 01-18-2014, 06:30 PM

Tried to fetch "the Guardian and The Observer" today (Jan 18, 2014) and kept getting "Failed" messages. Updated to the latest Calibre, but the messages continued. I don't know enough to tell if this is a glitch in the posting by The Guardian or a change in their procedure that requires modification of a Calibre recipe. Here's my failure notice:

Spoiler:

Any insight will be appreciated!

By the way, I use Calibre on an iMac, then transfer the files to my tablet using Calibre Companion...

kovidgoyal · 01-18-2014, 10:02 PM

This line:

httplib.BadStatusLine: ''

indicates that calibre received an invalid response when contacting the guardian servers. It may be a temporary problem so just try the download again later. Or it may be caused by network issues.

alex.x · 01-20-2014, 06:30 PM

Since 1 Jan 2014 the Gausrdian has a new web address. I have managed to compile this, not as good as original, but ok:

class AdvancedUserRecipe1388882568(BasicNewsRecipe):
title = u"Alex's Guardian"

base_url = "http://www.theguardian.com/theguardian"
cover_pic = 'Guardian digital edition'
masthead_url = 'http://static.guim.co.uk/static/3a21a6225712e7df59854c0749abc6cffcf00ef2/common/images/logos/the-guardian/titlepiece.gif'
oldest_article = 1
max_articles_per_feed = 100

auto_cleanup = True
auto_cleanup_keep = '//div[@id="main-content-picture"]'

# Removes empty feeds
remove_empty_feeds = True

feeds = [
(u'Top Stories', u'http://www.theguardian.com/theguardian/mainsection/topstories/rss'),
(u'UK News', u'http://feeds.theguardian.com/theguardian/uk-news/rss'),
(u'World', u'http://www.theguardian.com/world/rss'),
(u'Politics', u'http://www.theguardian.com/politics'),
(u'Comment', u'http://www.theguardian.com/uk/commentisfree'),
(u'Science', u'http://www.theguardian.com/science'),
(u'Education', u'http://www.theguardian.com/education'),
(u'Culture', u'http://www.theguardian.com/uk/culture'),
(u'Environment', u'http://www.theguardian.com/environment/rss'),
(u'Technology', u'http://feeds.theguardian.com/theguardian/technology/rss'),
(u'Saturday', u'http://www.theguardian.com/theguardian/2014/jan/04/mainsection/saturday'),
(u'Money', u'http://www.theguardian.com/uk/money/rss'),
(u'Editorials and Reply', u'http://www.theguardian.com/theguardian/mainsection/editorialsandreply'),
(u'Obituaries', u'http://www.theguardian.com/tone/obituaries/rss'),
(u'Reviews', u'http://www.theguardian.com/theguardian/guardianreview/rss'),
(u'Travel', u'http://www.theguardian.com/travel'),
(u'G2', u'http://www.theguardian.com/theguardian/g2/rss')
]

timefmt = ' [%a, %d %b %Y]'

remove_tags = [
dict(name='div', attrs={'class':["video-content","videos-third-column"]}),
dict(name='div', attrs={'id':["article-toolbox","subscribe-feeds",]}),
dict(name='div', attrs={'class':["guardian-tickets promo-component",]}),
dict(name='ul', attrs={'class':["pagination"]}),
dict(name='ul', attrs={'id':["content-actions"]}),
# article history link
dict(name='a', attrs={'class':["rollover history-link"]}),
# "a version of this article ..." speil
dict(name='div' , attrs = { 'class' : ['section']}),
# "about this article" js dialog
dict(name='div', attrs={'class':["share-top",]}),
# author picture
dict(name='img', attrs={'class':["contributor-pic-small"]}),
# embedded videos/captions
dict(name='span',attrs={'class' : ['inline embed embed-media']}),
#dict(name='img'),
]
use_embedded_content = False

#: Ignore duplicates of articles that are present in more than one section.
#: A duplicate article is an article that has the same title and/or URL.
#: To ignore articles with the same title, set this to:
#: ignore_duplicate_articles = {'title'}
#: To use URLs instead, set it to:
#: ignore_duplicate_articles = {'url'}
#: To match on title or URL, set it to:
ignore_duplicate_articles = {'title', 'url'}

#: Rescale images to fit in the device screen dimensions set by the output profile.
#: Ignored if no output profile is set.
scale_news_images_to_device = True

#: Maximum dimensions (w,h) to scale images to. If scale_news_images_to_device is True
#: this is set to the device screen dimensions set by the output profile unless
#: there is no profile set, in which case it is left at whatever value it has been
#: assigned (default None).
scale_news_images = None

#: The factor used when auto compressing jpeg images. If set to None,
#: auto compression is disabled. Otherwise, the images will be reduced in size to
#: (w * h)/compress_news_images_auto_size bytes if possible by reducing
#: the quality level, where w x h are the image dimensions in pixels.
#: The minimum jpeg quality will be 5/100 so it is possible this constraint
#: will not be met. This parameter can be overridden by the parameter
#: compress_news_images_max_size which provides a fixed maximum size for images.
#: Note that if you enable scale_news_images_to_device then the image will
#: first be scaled and then its quality lowered until its size is less than
#: (w * h)/factor where w and h are now the *scaled* image dimensions. In
#: other words, this compression happens after scaling.
compress_news_images_auto_size = 16

no_stylesheets = True
extra_css = '''
.article-attributes{font-size: x-small; font-family:Arial,Helvetica,sans-serif;}
.h1{font-size: large ;font-family:georgia,serif; font-weight:bold;}
.stand-first-alone{color:#040404; font-size:small; font-family:Arial,Helvetica,sans-serif;}
.caption{color:#040404; font-size:x-small; font-family:Arial,Helvetica,sans-serif;}
#article-wrapper{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
.main-article-info{font-family:Arial,Helvetica,sans-serif;}
#full-contents{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
#match-stats-summary{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
'''

def get_article_url(self, article):
url = article.get('guid', None)
if '/video/' in url or '/flyer/' in url or '/quiz/' in url or \
'/gallery/' in url or 'ivebeenthere' in url or \
'pickthescore' in url or 'audioslideshow' in url :
url = None
return url

def populate_article_metadata(self, article, soup, first):
if first and hasattr(self, 'add_toc_thumbnail'):
picdiv = soup.find('img')
if picdiv is not None:
self.add_toc_thumbnail(article,picdiv['src'])

01-18-2014, 06:30 PM	#1
CaliWenger Junior Member Posts: 1 Karma: 10 Join Date: Jan 2014 Device: Samsung GalaxyTab3 apps	Failed: Fetch News from The Guardian... Tried to fetch "the Guardian and The Observer" today (Jan 18, 2014) and kept getting "Failed" messages. Updated to the latest Calibre, but the messages continued. I don't know enough to tell if this is a glitch in the posting by The Guardian or a change in their procedure that requires modification of a Calibre recipe. Here's my failure notice: Spoiler: calibre, version 1.20.0 (darwin, isfrozen: True) Conversion Error: Failed: Fetch news from The Guardian and The Observer Fetch news from The Guardian and The Observer Resolved conversion options calibre version: 1.20.0 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_compress': False, 'dont_download_recipe': False, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x108ec4f50>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'mobi_file_type': 'old', 'mobi_ignore_margins': False, 'mobi_keep_original_images': False, 'mobi_toc_at_start': False, 'no_chapters_in_toc': False, 'no_inline_navbars': True, 'no_inline_toc': False, 'output_profile': <calibre.customize.profiles.KindleOutput object at 0x108ec3610>, 'page_breaks_before': None, 'personal_doc': '[PDOC]', 'prefer_author_sort': False, 'prefer_metadata_cover': False, 'pretty_print': False, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'search_replace': None, 'series': None, 'series_index': None, 'share_not_sync': False, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} Python function terminated unexpectedly: '' InputFormatPlugin: Recipe Input running Using custom recipe Traceback (most recent call last): File "/Applications/ Office Applications/calibre 1.20.0.app/Contents/Resources/Python/lib/python2.7/site.py", line 208, in main return run_entry_point() File "/Applications/ Office Applications/calibre 1.20.0.app/Contents/Resources/Python/lib/python2.7/site.py", line 114, in run_entry_point return getattr(pmod, func)() File "site-packages/calibre/utils/ipc/worker.py", line 192, in main File "site-packages/calibre/gui2/convert/gui_conversion.py", line 25, in gui_convert File "site-packages/calibre/ebooks/conversion/plumber.py", line 1035, in run File "site-packages/calibre/customize/conversion.py", line 241, in __call__ File "site-packages/calibre/ebooks/conversion/plugins/recipe_input.py", line 117, in convert File "site-packages/calibre/web/feeds/news.py", line 982, in download File "site-packages/calibre/web/feeds/news.py", line 1147, in build_index File "<string>", line 162, in parse_index File "<string>", line 139, in find_articles File "site-packages/calibre/web/feeds/news.py", line 652, in index_to_soup File "site-packages/mechanize/_mechanize.py", line 199, in open_novisit File "site-packages/mechanize/_mechanize.py", line 230, in _mech_open File "site-packages/mechanize/_opener.py", line 193, in open File "site-packages/mechanize/_urllib2_fork.py", line 344, in _open File "site-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain File "site-packages/mechanize/_urllib2_fork.py", line 1142, in http_open File "site-packages/mechanize/_urllib2_fork.py", line 1116, in do_open File "lib/python2.7/httplib.py", line 1045, in getresponse File "lib/python2.7/httplib.py", line 409, in begin File "lib/python2.7/httplib.py", line 373, in _read_status httplib.BadStatusLine: '' Any insight will be appreciated! By the way, I use Calibre on an iMac, then transfer the files to my tablet using Calibre Companion... Last edited by CaliWenger; 01-18-2014 at 06:31 PM. Reason: further info

01-20-2014, 06:30 PM	#3
alex.x Junior Member Posts: 1 Karma: 10 Join Date: Jan 2014 Device: kindle	The Guardian Since 1 Jan 2014 the Gausrdian has a new web address. I have managed to compile this, not as good as original, but ok: class AdvancedUserRecipe1388882568(BasicNewsRecipe): title = u"Alex's Guardian" base_url = "http://www.theguardian.com/theguardian" cover_pic = 'Guardian digital edition' masthead_url = 'http://static.guim.co.uk/static/3a21a6225712e7df59854c0749abc6cffcf00ef2/common/images/logos/the-guardian/titlepiece.gif' oldest_article = 1 max_articles_per_feed = 100 auto_cleanup = True auto_cleanup_keep = '//div[@id="main-content-picture"]' # Removes empty feeds remove_empty_feeds = True feeds = [ (u'Top Stories', u'http://www.theguardian.com/theguardian/mainsection/topstories/rss'), (u'UK News', u'http://feeds.theguardian.com/theguardian/uk-news/rss'), (u'World', u'http://www.theguardian.com/world/rss'), (u'Politics', u'http://www.theguardian.com/politics'), (u'Comment', u'http://www.theguardian.com/uk/commentisfree'), (u'Science', u'http://www.theguardian.com/science'), (u'Education', u'http://www.theguardian.com/education'), (u'Culture', u'http://www.theguardian.com/uk/culture'), (u'Environment', u'http://www.theguardian.com/environment/rss'), (u'Technology', u'http://feeds.theguardian.com/theguardian/technology/rss'), (u'Saturday', u'http://www.theguardian.com/theguardian/2014/jan/04/mainsection/saturday'), (u'Money', u'http://www.theguardian.com/uk/money/rss'), (u'Editorials and Reply', u'http://www.theguardian.com/theguardian/mainsection/editorialsandreply'), (u'Obituaries', u'http://www.theguardian.com/tone/obituaries/rss'), (u'Reviews', u'http://www.theguardian.com/theguardian/guardianreview/rss'), (u'Travel', u'http://www.theguardian.com/travel'), (u'G2', u'http://www.theguardian.com/theguardian/g2/rss') ] timefmt = ' [%a, %d %b %Y]' remove_tags = [ dict(name='div', attrs={'class':["video-content","videos-third-column"]}), dict(name='div', attrs={'id':["article-toolbox","subscribe-feeds",]}), dict(name='div', attrs={'class':["guardian-tickets promo-component",]}), dict(name='ul', attrs={'class':["pagination"]}), dict(name='ul', attrs={'id':["content-actions"]}), # article history link dict(name='a', attrs={'class':["rollover history-link"]}), # "a version of this article ..." speil dict(name='div' , attrs = { 'class' : ['section']}), # "about this article" js dialog dict(name='div', attrs={'class':["share-top",]}), # author picture dict(name='img', attrs={'class':["contributor-pic-small"]}), # embedded videos/captions dict(name='span',attrs={'class' : ['inline embed embed-media']}), #dict(name='img'), ] use_embedded_content = False #: Ignore duplicates of articles that are present in more than one section. #: A duplicate article is an article that has the same title and/or URL. #: To ignore articles with the same title, set this to: #: ignore_duplicate_articles = {'title'} #: To use URLs instead, set it to: #: ignore_duplicate_articles = {'url'} #: To match on title or URL, set it to: ignore_duplicate_articles = {'title', 'url'} #: Rescale images to fit in the device screen dimensions set by the output profile. #: Ignored if no output profile is set. scale_news_images_to_device = True #: Maximum dimensions (w,h) to scale images to. If scale_news_images_to_device is True #: this is set to the device screen dimensions set by the output profile unless #: there is no profile set, in which case it is left at whatever value it has been #: assigned (default None). scale_news_images = None #: The factor used when auto compressing jpeg images. If set to None, #: auto compression is disabled. Otherwise, the images will be reduced in size to #: (w * h)/compress_news_images_auto_size bytes if possible by reducing #: the quality level, where w x h are the image dimensions in pixels. #: The minimum jpeg quality will be 5/100 so it is possible this constraint #: will not be met. This parameter can be overridden by the parameter #: compress_news_images_max_size which provides a fixed maximum size for images. #: Note that if you enable scale_news_images_to_device then the image will #: first be scaled and then its quality lowered until its size is less than #: (w * h)/factor where w and h are now the scaled image dimensions. In #: other words, this compression happens after scaling. compress_news_images_auto_size = 16 no_stylesheets = True extra_css = ''' .article-attributes{font-size: x-small; font-family:Arial,Helvetica,sans-serif;} .h1{font-size: large ;font-family:georgia,serif; font-weight:bold;} .stand-first-alone{color:#040404; font-size:small; font-family:Arial,Helvetica,sans-serif;} .caption{color:#040404; font-size:x-small; font-family:Arial,Helvetica,sans-serif;} #article-wrapper{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;} .main-article-info{font-family:Arial,Helvetica,sans-serif;} #full-contents{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;} #match-stats-summary{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;} ''' def get_article_url(self, article): url = article.get('guid', None) if '/video/' in url or '/flyer/' in url or '/quiz/' in url or \ '/gallery/' in url or 'ivebeenthere' in url or \ 'pickthescore' in url or 'audioslideshow' in url : url = None return url def populate_article_metadata(self, article, soup, first): if first and hasattr(self, 'add_toc_thumbnail'): picdiv = soup.find('img') if picdiv is not None: self.add_toc_thumbnail(article,picdiv['src'])

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Medscape: failed fetch news	Barry6	Recipes	5	04-25-2015 09:31 AM
'Failed: Fetch news from The New Republic'	symmetry	Recipes	7	03-25-2013 07:28 PM
FAILED: Fetch news from New York Times	gianfri	Recipes	3	02-02-2013 03:45 PM
Failed: Fetch News and Conversion Error	earl412	Recipes	1	12-29-2012 09:54 AM
Failed to fetch news	Hemant	Calibre	10	08-25-2010 09:22 AM

01-18-2014, 10:02 PM	#2
kovidgoyal creator of calibre Posts: 45,318 Karma: 27111242 Join Date: Oct 2006 Location: Mumbai, India Device: Various	This line: httplib.BadStatusLine: '' indicates that calibre received an invalid response when contacting the guardian servers. It may be a temporary problem so just try the download again later. Or it may be caused by network issues.

Advert