10-26-2010, 08:33 AM | #1 |
Connoisseur
Posts: 98
Karma: 10
Join Date: Apr 2008
Device: sony prs 505
|
NY Times problem
I am using calibre v.0.7.2.4 and mac os 10.6.4.
I tired to fetch the NY Times today and revieived the following error message: Failed: Fetch news from The New York Times. Here is the report: ERROR: Conversion Error: <b>Failed</b>: Fetch news from The New York Times Fetch news from The New York Times Resolved conversion options calibre version: 0.7.24 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_split_on_page_breaks': True, 'extra_css': None, 'extract_to': None, 'flow_size': 260, 'font_size_mapping': None, 'footer_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s * <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>) ' , 'header_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s * <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>) ' , 'html_unwrap_factor': 0.40000000000000002, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x690de90>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'max_toc_links': 50, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x6916270>, 'page_breaks_before': None, 'password': 'scottsan', 'prefer_metadata_cover': False, 'preprocess_html': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_first_image': False, 'remove_footer': False, 'remove_header': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'use_auto_toc': False, 'username': 'ankaraaikikai@mac.com', 'verbose': 2} Python function terminated unexpectedly: list index out of range InputFormatPlugin: Recipe Input running Queued 0 articles Traceback (most recent call last): File "/Applications/Ebook Software/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 147, in main return run_entry_point() File "/Applications/Ebook Software/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 116, in run_entry_point return getattr(pmod, func)() File "site-packages/calibre/utils/ipc/worker.py", line 107, in main File "site-packages/calibre/gui2/convert/gui_conversion.py", line 24, in gui_convert File "site-packages/calibre/ebooks/conversion/plumber.py", line 832, in run File "site-packages/calibre/customize/conversion.py", line 216, in __call__ File "site-packages/calibre/web/feeds/input.py", line 105, in convert File "site-packages/calibre/web/feeds/news.py", line 713, in download File "site-packages/calibre/web/feeds/news.py", line 876, in build_index IndexError: list index out of range What's wrong?? Thanks |
10-26-2010, 09:03 AM | #2 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
10-26-2010, 10:46 AM | #3 |
Connoisseur
Posts: 98
Karma: 10
Join Date: Apr 2008
Device: sony prs 505
|
NY Times problem
I'm curious to know why this recipe worked with the previous version of Calibre, but now doesn't. When Calibre comes out with a new version are the recipes changed or modified in some way? And, if a recipe is working just fine, why would it be modified?
|
10-26-2010, 11:07 AM | #4 | |||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
|
|||
10-26-2010, 11:42 AM | #5 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
NYT recipe update
Format changes on the NYT web site. Here is an updated recipe:
Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>' ''' nytimes.com ''' import string, re, time from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup def decode(self, src): enc = 'utf-8' if 'iso-8859-1' in src: enc = 'cp1252' return src.decode(enc, 'ignore') class NYTimes(BasicNewsRecipe): title = u'New York Times' __author__ = 'Kovid Goyal/Nick Redding' language = 'en' requires_version = (0, 6, 36) description = 'Daily news from the New York Times (subscription version)' timefmt = ' [%b %d]' needs_subscription = True remove_tags_before = dict(id='article') remove_tags_after = dict(id='article') remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool','nextArticleLink', 'nextArticleLink clearfix','columnGroup doubleRule','doubleRule','entry-meta', 'icon enlargeThis','columnGroup last','relatedSearchesModule']}), dict({'class':re.compile('^subNavigation')}), dict({'class':re.compile('^leaderboard')}), dict({'class':re.compile('^module')}), dict({'class':'metaFootnote'}), dict(id=['inlineBox','footer', 'toolsRight', 'articleInline','login','masthead', 'navigation', 'archive', 'side_search', 'blog_sidebar','cCol','portfolioInline', 'side_tool', 'side_index','header','readerReviewsCount','readerReviews', 'relatedArticles', 'relatedTopics', 'adxSponLink']), dict(name=['script', 'noscript', 'style','form','hr'])] encoding = decode no_stylesheets = True extra_css = ''' .articleHeadline { margin-top:0.5em; margin-bottom:0.25em; } .credit { font-size: small; font-style:italic; line-height:1em; margin-top:5px; margin-left:0; margin-right:0; margin-bottom: 0; } .byline { font-size: small; font-style:italic; line-height:1em; margin-top:10px; margin-left:0; margin-right:0; margin-bottom: 0; } .dateline { font-size: small; line-height:1em;margin-top:5px; margin-left:0; margin-right:0; margin-bottom: 0; } .kicker { font-size: small; line-height:1em;margin-top:5px; margin-left:0; margin-right:0; margin-bottom: 0; } .timestamp { font-size: small; } .caption { font-size: small; line-height:1em; margin-top:5px; margin-left:0; margin-right:0; margin-bottom: 0; } a:link {text-decoration: none; }''' def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open('http://www.nytimes.com/auth/login') br.select_form(name='login') br['USERID'] = self.username br['PASSWORD'] = self.password raw = br.submit().read() if 'Sorry, we could not find the combination you entered. Please try again.' in raw: raise Exception('Your username and password are incorrect') #open('/t/log.html', 'wb').write(raw) return br def get_masthead_url(self): masthead = 'http://graphics8.nytimes.com/images/misc/nytlogo379x64.gif' #masthead = 'http://members.cox.net/nickredding/nytlogo.gif' br = BasicNewsRecipe.get_browser() try: br.open(masthead) except: self.log("\nMasthead unavailable") masthead = None return masthead def get_cover_url(self): cover = None st = time.localtime() year = str(st.tm_year) month = "%.2d" % st.tm_mon day = "%.2d" % st.tm_mday cover = 'http://graphics8.nytimes.com/images/' + year + '/' + month +'/' + day +'/nytfrontpage/scan.jpg' br = BasicNewsRecipe.get_browser() try: br.open(cover) except: self.log("\nCover unavailable") cover = None return cover def short_title(self): return 'New York Times' def parse_index(self): self.encoding = 'cp1252' soup = self.index_to_soup('http://www.nytimes.com/pages/todayspaper/index.html') self.encoding = decode def feed_title(div): return ''.join(div.findAll(text=True, recursive=True)).strip() articles = {} key = None ans = [] url_list = [] def handle_article(div): a = div.find('a', href=True) if not a: return url = re.sub(r'\?.*', '', a['href']) if not url.startswith("http"): return if not url.endswith(".html"): return if 'podcast' in url: return url += '?pagewanted=all' if url in url_list: return url_list.append(url) title = self.tag_to_string(a, use_alt=True).strip() #self.log("Title: %s" % title) description = '' pubdate = strftime('%a, %d %b') summary = div.find(True, attrs={'class':'summary'}) if summary: description = self.tag_to_string(summary, use_alt=False) author = '' authorAttribution = div.find(True, attrs={'class':'byline'}) if authorAttribution: author = self.tag_to_string(authorAttribution, use_alt=False) else: authorAttribution = div.find(True, attrs={'class':'byline'}) if authorAttribution: author = self.tag_to_string(authorAttribution, use_alt=False) feed = key if key is not None else 'Uncategorized' if not articles.has_key(feed): articles[feed] = [] articles[feed].append( dict(title=title, url=url, date=pubdate, description=description, author=author, content='')) # Find each instance of class="section-headline", class="story", class="story headline" for div in soup.findAll(True, attrs={'class':['section-headline', 'story', 'story headline','sectionHeader','headlinesOnly multiline flush']}): if div['class'] in ['section-headline','sectionHeader']: key = string.capwords(feed_title(div)) articles[key] = [] ans.append(key) #self.log('Section: %s' % key) elif div['class'] in ['story', 'story headline'] : handle_article(div) elif div['class'] == 'headlinesOnly multiline flush': for lidiv in div.findAll('li'): handle_article(lidiv) # ans = self.sort_index_by(ans, {'The Front Page':-1, # 'Dining In, Dining Out':1, # 'Obituaries':2}) ans = [(key, articles[key]) for key in ans if articles.has_key(key)] return ans def preprocess_html(self, soup): kicker_tag = soup.find(attrs={'class':'kicker'}) if kicker_tag: tagline = self.tag_to_string(kicker_tag) #self.log("FOUND KICKER %s" % tagline) if tagline=='Op-Ed Columnist': img_div = soup.find('div','inlineImage module') #self.log("Searching for photo") if img_div: img_div.extract() #self.log("Photo deleted") refresh = soup.find('meta', {'http-equiv':'refresh'}) if refresh is None: return soup content = refresh.get('content').partition('=')[2] raw = self.browser.open_novisit('http://www.nytimes.com'+content).read() return BeautifulSoup(raw.decode('cp1252', 'replace')) |
10-26-2010, 12:41 PM | #6 |
Connoisseur
Posts: 98
Karma: 10
Join Date: Apr 2008
Device: sony prs 505
|
NY Times problem
thanks,
it worked like a charm |
10-30-2010, 03:25 AM | #7 |
Addict
Posts: 288
Karma: 1094000
Join Date: Mar 2010
Location: Essonne, France
Device: Kobo Forma; Sony PRS600B; Sony 350; Sony T-2
|
Say, many thanks for that. I had noticed the change in format in the e-mail "Top Stories" I get, but since I only download the Times a couple times a week hadn't gotten around to "engaging with" the issue. You saved me lots of time and effort!
|
10-31-2010, 06:26 AM | #8 | |
Connoisseur
Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Any hope to have the "Top Stories" recipe updated as well?
Any hope to have the "Top Stories" recipe updated as well?
Quote:
|
|
10-31-2010, 09:24 PM | #9 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
I'm working on it--check back on Tuesday!
|
11-28-2010, 08:02 PM | #10 |
Member
Posts: 11
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
Hi, I'm new to Calibre and still find it a little confusing. I've been having difficulty getting it to download from NYT too so this seems to be the solution - do I just copy and paste the entire recipe to replace the existing one, click on Add Recipe and save? Thanks
|
11-28-2010, 11:42 PM | #11 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Kovid has updated the standard recipes--all you have to do is select "New York Times" or "New York Times Headlines" from the English recipes. Note that "New York Times" requires a (free) login which you can get from www.nytimes.com.
|
11-29-2010, 03:37 PM | #12 |
Member
Posts: 11
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
Thanks, it seemed to be having difficulties with my existing nyt account and log-in, so I created a new one just for Calibre and it worked perfectly.
|
12-02-2010, 09:21 PM | #13 | |
Dances with penguins
Posts: 54
Karma: 10
Join Date: Oct 2010
Device: Sony PRS-350
|
Quote:
|
|
12-03-2010, 10:56 AM | #14 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
|
12-03-2010, 10:57 AM | #15 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Phoul - perhaps you are having the login problem that surfaced today due to NYT format changes -- see https://www.mobileread.com/forums/sho...d.php?t=109611 for the solution.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
NY Times problem | scottsan | Calibre | 5 | 10-26-2010 09:49 AM |
Calibre-NY Times problem | moosejons_dad | Calibre | 15 | 03-18-2009 07:51 AM |
Calibre 4.102-NY Times problem | moosejons_dad | Calibre | 21 | 11-07-2008 09:05 PM |
calibre - New York Times - Sony Library Problem | Deputy-Dawg | Calibre | 5 | 06-21-2008 10:23 AM |
NY Times problem | radleyp | Feedback | 1 | 02-12-2003 02:04 PM |