|
|
#1 |
|
Connoisseur
![]() Posts: 98
Karma: 10
Join Date: Apr 2008
Device: sony prs 505
|
NY Times problem
I am using calibre v.0.7.2.4 and mac os 10.6.4.
I tired to fetch the NY Times today and revieived the following error message: Failed: Fetch news from The New York Times. Here is the report: ERROR: Conversion Error: <b>Failed</b>: Fetch news from The New York Times Fetch news from The New York Times Resolved conversion options calibre version: 0.7.24 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_split_on_page_breaks': True, 'extra_css': None, 'extract_to': None, 'flow_size': 260, 'font_size_mapping': None, 'footer_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s * <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>) ' , 'header_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s * <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>) ' , 'html_unwrap_factor': 0.40000000000000002, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x690de90>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'max_toc_links': 50, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x6916270>, 'page_breaks_before': None, 'password': 'scottsan', 'prefer_metadata_cover': False, 'preprocess_html': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_first_image': False, 'remove_footer': False, 'remove_header': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'use_auto_toc': False, 'username': 'ankaraaikikai@mac.com', 'verbose': 2} Python function terminated unexpectedly: list index out of range InputFormatPlugin: Recipe Input running Queued 0 articles Traceback (most recent call last): File "/Applications/Ebook Software/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 147, in main return run_entry_point() File "/Applications/Ebook Software/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 116, in run_entry_point return getattr(pmod, func)() File "site-packages/calibre/utils/ipc/worker.py", line 107, in main File "site-packages/calibre/gui2/convert/gui_conversion.py", line 24, in gui_convert File "site-packages/calibre/ebooks/conversion/plumber.py", line 832, in run File "site-packages/calibre/customize/conversion.py", line 216, in __call__ File "site-packages/calibre/web/feeds/input.py", line 105, in convert File "site-packages/calibre/web/feeds/news.py", line 713, in download File "site-packages/calibre/web/feeds/news.py", line 876, in build_index IndexError: list index out of range What's wrong?? Thanks |
|
|
|
|
|
#2 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
|
|
|
|
|
#3 |
|
Connoisseur
![]() Posts: 98
Karma: 10
Join Date: Apr 2008
Device: sony prs 505
|
NY Times problem
I'm curious to know why this recipe worked with the previous version of Calibre, but now doesn't. When Calibre comes out with a new version are the recipes changed or modified in some way? And, if a recipe is working just fine, why would it be modified?
|
|
|
|
|
|
#4 | |||
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
|
|||
|
|
|
|
|
#5 |
|
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 331
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
NYT recipe update
Format changes on the NYT web site. Here is an updated recipe:
Code:
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
'''
nytimes.com
'''
import string, re, time
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
def decode(self, src):
enc = 'utf-8'
if 'iso-8859-1' in src:
enc = 'cp1252'
return src.decode(enc, 'ignore')
class NYTimes(BasicNewsRecipe):
title = u'New York Times'
__author__ = 'Kovid Goyal/Nick Redding'
language = 'en'
requires_version = (0, 6, 36)
description = 'Daily news from the New York Times (subscription version)'
timefmt = ' [%b %d]'
needs_subscription = True
remove_tags_before = dict(id='article')
remove_tags_after = dict(id='article')
remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool','nextArticleLink',
'nextArticleLink clearfix','columnGroup doubleRule','doubleRule','entry-meta',
'icon enlargeThis','columnGroup last','relatedSearchesModule']}),
dict({'class':re.compile('^subNavigation')}),
dict({'class':re.compile('^leaderboard')}),
dict({'class':re.compile('^module')}),
dict({'class':'metaFootnote'}),
dict(id=['inlineBox','footer', 'toolsRight', 'articleInline','login','masthead',
'navigation', 'archive', 'side_search', 'blog_sidebar','cCol','portfolioInline',
'side_tool', 'side_index','header','readerReviewsCount','readerReviews',
'relatedArticles', 'relatedTopics', 'adxSponLink']),
dict(name=['script', 'noscript', 'style','form','hr'])]
encoding = decode
no_stylesheets = True
extra_css = '''
.articleHeadline { margin-top:0.5em; margin-bottom:0.25em; }
.credit { font-size: small; font-style:italic; line-height:1em; margin-top:5px; margin-left:0; margin-right:0; margin-bottom: 0; }
.byline { font-size: small; font-style:italic; line-height:1em; margin-top:10px; margin-left:0; margin-right:0; margin-bottom: 0; }
.dateline { font-size: small; line-height:1em;margin-top:5px; margin-left:0; margin-right:0; margin-bottom: 0; }
.kicker { font-size: small; line-height:1em;margin-top:5px; margin-left:0; margin-right:0; margin-bottom: 0; }
.timestamp { font-size: small; }
.caption { font-size: small; line-height:1em; margin-top:5px; margin-left:0; margin-right:0; margin-bottom: 0; }
a:link {text-decoration: none; }'''
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('http://www.nytimes.com/auth/login')
br.select_form(name='login')
br['USERID'] = self.username
br['PASSWORD'] = self.password
raw = br.submit().read()
if 'Sorry, we could not find the combination you entered. Please try again.' in raw:
raise Exception('Your username and password are incorrect')
#open('/t/log.html', 'wb').write(raw)
return br
def get_masthead_url(self):
masthead = 'http://graphics8.nytimes.com/images/misc/nytlogo379x64.gif'
#masthead = 'http://members.cox.net/nickredding/nytlogo.gif'
br = BasicNewsRecipe.get_browser()
try:
br.open(masthead)
except:
self.log("\nMasthead unavailable")
masthead = None
return masthead
def get_cover_url(self):
cover = None
st = time.localtime()
year = str(st.tm_year)
month = "%.2d" % st.tm_mon
day = "%.2d" % st.tm_mday
cover = 'http://graphics8.nytimes.com/images/' + year + '/' + month +'/' + day +'/nytfrontpage/scan.jpg'
br = BasicNewsRecipe.get_browser()
try:
br.open(cover)
except:
self.log("\nCover unavailable")
cover = None
return cover
def short_title(self):
return 'New York Times'
def parse_index(self):
self.encoding = 'cp1252'
soup = self.index_to_soup('http://www.nytimes.com/pages/todayspaper/index.html')
self.encoding = decode
def feed_title(div):
return ''.join(div.findAll(text=True, recursive=True)).strip()
articles = {}
key = None
ans = []
url_list = []
def handle_article(div):
a = div.find('a', href=True)
if not a:
return
url = re.sub(r'\?.*', '', a['href'])
if not url.startswith("http"):
return
if not url.endswith(".html"):
return
if 'podcast' in url:
return
url += '?pagewanted=all'
if url in url_list:
return
url_list.append(url)
title = self.tag_to_string(a, use_alt=True).strip()
#self.log("Title: %s" % title)
description = ''
pubdate = strftime('%a, %d %b')
summary = div.find(True, attrs={'class':'summary'})
if summary:
description = self.tag_to_string(summary, use_alt=False)
author = ''
authorAttribution = div.find(True, attrs={'class':'byline'})
if authorAttribution:
author = self.tag_to_string(authorAttribution, use_alt=False)
else:
authorAttribution = div.find(True, attrs={'class':'byline'})
if authorAttribution:
author = self.tag_to_string(authorAttribution, use_alt=False)
feed = key if key is not None else 'Uncategorized'
if not articles.has_key(feed):
articles[feed] = []
articles[feed].append(
dict(title=title, url=url, date=pubdate,
description=description, author=author,
content=''))
# Find each instance of class="section-headline", class="story", class="story headline"
for div in soup.findAll(True,
attrs={'class':['section-headline', 'story', 'story headline','sectionHeader','headlinesOnly multiline flush']}):
if div['class'] in ['section-headline','sectionHeader']:
key = string.capwords(feed_title(div))
articles[key] = []
ans.append(key)
#self.log('Section: %s' % key)
elif div['class'] in ['story', 'story headline'] :
handle_article(div)
elif div['class'] == 'headlinesOnly multiline flush':
for lidiv in div.findAll('li'):
handle_article(lidiv)
# ans = self.sort_index_by(ans, {'The Front Page':-1,
# 'Dining In, Dining Out':1,
# 'Obituaries':2})
ans = [(key, articles[key]) for key in ans if articles.has_key(key)]
return ans
def preprocess_html(self, soup):
kicker_tag = soup.find(attrs={'class':'kicker'})
if kicker_tag:
tagline = self.tag_to_string(kicker_tag)
#self.log("FOUND KICKER %s" % tagline)
if tagline=='Op-Ed Columnist':
img_div = soup.find('div','inlineImage module')
#self.log("Searching for photo")
if img_div:
img_div.extract()
#self.log("Photo deleted")
refresh = soup.find('meta', {'http-equiv':'refresh'})
if refresh is None:
return soup
content = refresh.get('content').partition('=')[2]
raw = self.browser.open_novisit('http://www.nytimes.com'+content).read()
return BeautifulSoup(raw.decode('cp1252', 'replace'))
|
|
|
|
|
|
#6 |
|
Connoisseur
![]() Posts: 98
Karma: 10
Join Date: Apr 2008
Device: sony prs 505
|
NY Times problem
thanks,
it worked like a charm |
|
|
|
|
|
#7 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 288
Karma: 1094000
Join Date: Mar 2010
Location: Essonne, France
Device: Kobo Forma; Sony PRS600B; Sony 350; Sony T-2
|
Say, many thanks for that. I had noticed the change in format in the e-mail "Top Stories" I get, but since I only download the Times a couple times a week hadn't gotten around to "engaging with" the issue. You saved me lots of time and effort!
|
|
|
|
|
|
#8 | |
|
Connoisseur
![]() Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Any hope to have the "Top Stories" recipe updated as well?
Any hope to have the "Top Stories" recipe updated as well?
Quote:
|
|
|
|
|
|
|
#9 |
|
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 331
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
I'm working on it--check back on Tuesday!
|
|
|
|
|
|
#10 |
|
Member
![]() Posts: 11
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
Hi, I'm new to Calibre and still find it a little confusing. I've been having difficulty getting it to download from NYT too so this seems to be the solution - do I just copy and paste the entire recipe to replace the existing one, click on Add Recipe and save? Thanks
|
|
|
|
|
|
#11 |
|
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 331
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Kovid has updated the standard recipes--all you have to do is select "New York Times" or "New York Times Headlines" from the English recipes. Note that "New York Times" requires a (free) login which you can get from www.nytimes.com.
|
|
|
|
|
|
#12 |
|
Member
![]() Posts: 11
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
Thanks, it seemed to be having difficulties with my existing nyt account and log-in, so I created a new one just for Calibre and it worked perfectly.
|
|
|
|
|
|
#13 | |
|
Dances with penguins
![]() Posts: 54
Karma: 10
Join Date: Oct 2010
Device: Sony PRS-350
|
Quote:
|
|
|
|
|
|
|
#14 |
|
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 331
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
|
|
|
|
|
|
#15 |
|
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 331
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Phoul - perhaps you are having the login problem that surfaced today due to NYT format changes -- see https://www.mobileread.com/forums/sho...d.php?t=109611 for the solution.
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| NY Times problem | scottsan | Calibre | 5 | 10-26-2010 09:49 AM |
| Calibre-NY Times problem | moosejons_dad | Calibre | 15 | 03-18-2009 07:51 AM |
| Calibre 4.102-NY Times problem | moosejons_dad | Calibre | 21 | 11-07-2008 09:05 PM |
| calibre - New York Times - Sony Library Problem | Deputy-Dawg | Calibre | 5 | 06-21-2008 10:23 AM |
| NY Times problem | radleyp | Feedback | 1 | 02-12-2003 02:04 PM |