|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
ESPN recipe fails
(the one by Kovid and Raman)
Trying to get latest version of recipe: espn Python function terminated unexpectedly HTTP Error 401: Unauthorized (Error Code: 1) Traceback (most recent call last): File "site.py", line 101, in main File "site.py", line 78, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 199, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 35, in gui_convert_recipe File "site-packages\calibre\gui2\convert\gui_conversion.py", line 27, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 1106, in run File "site-packages\calibre\customize\conversion.py", line 244, in __call__ File "site-packages\calibre\ebooks\conversion\plugins\recipe_ input.py", line 135, in convert File "site-packages\calibre\web\feeds\news.py", line 901, in __init__ File "<string>", line 82, in get_browser File "site-packages\mechanize\_mechanize.py", line 254, in open File "site-packages\mechanize\_mechanize.py", line 310, in _mech_open mechanize._response.httperror_seek_wrapper: HTTP Error 401: Unauthorized |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,618
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I need ESPN account credentials to look at that.
|
|
|
|
|
|
#3 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
You can create a login, or log in with Facebook. And if you don't have a login how did you originally create the recipe?
Last edited by NSILMike; 12-13-2018 at 11:05 AM. |
|
|
|
|
|
#4 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
|
|
|
|
|
|
#5 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
Just downloaded Calibre 3.36 which says ESPN recipe is improved. Now it doesn't fail, but it downloads only links...
|
|
|
|
|
|
#6 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,618
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yeah I looked at it briefly, ESPN uses a complicated javascript based mechanism to login,which I dont have the time/interest to reverse engineer.
|
|
|
|
|
|
#7 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
|
|
|
|
|
|
#8 | |
|
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
Quote:
I don't know anything about calibre, but I did find some information that I think might be of help. In the file ./resources/builtin_recipes.zip I found a file called espn.recipe. Looking at it, and looking at the web site, I tried this web page: http://sports.espn.go.com/espn/rss/nfl/news which gave me a bunch of stuff, including a URL that looked like this: https://www.espn.com/nfl/story/_/id/...king-full-list I saw on line 109 something that looked interesting, so I tried to go to this page to get the story. http://sports.espn.go.com/espn/print?id=29533526 which seems to work pretty well. Looking at line 115, I saw that this sort of an URL was an interesting idea. https://www.espn.com/espn/print?id=29533526&type=story And that one works as well. Maybe this is all that needs to be changed? Thanks, Rob |
|
|
|
|
|
|
#9 |
|
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
Well, I don't know how to know if it is working or not. It is not downloading because of age issues that I don't understand.
I am getting this message a lot: Skipping article Bubbles are working for other sports. Why did the NFL decide against one? (Tue, 28 Jul, 2020 11:04) from feed www.espn.com - NFL as it is too old. I'll keep poking around for a cache somewhere. Thanks, Rob |
|
|
|
|
|
#10 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,618
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Set oldest_article in the recipe to control that. And you dont need to look in builtin_recipes.zip to edit recipes, calibre has UI for that. https://manual.calibre-ebook.com/news.html
|
|
|
|
|
|
#11 |
|
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
Thank you!
That was exactly where I needed to start. I copied some things from the other espn script, and other things I don't know what they do enough to copy them over and understand what is going on. Here's my script for now, in case anyone else wants to use it. Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1596778396(BasicNewsRecipe):
title = 'espn_modified'
description = 'Sports news'
__author__ = 'Rob Walker'
language = 'en'
no_stylesheets = True
use_embedded_content = False
remove_javascript = True
encoding = 'ISO-8859-1'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
remove_tags_before = dict(name='font', attrs={'class': 'date'})
remove_tags = [
dict(name='font', attrs={'class': 'footer'}), dict(
name='hr', noshade='noshade'),
dict(name='img', src='/winnercomm/horseracing/DRF.jpg')
]
extra_css = '''
body{font-family:Verdana,Arial,Helvetica,sans-serif; font-size:x-small; font-weight:normal;}
.subhead{color:#666666;font-family:Verdana,sans-serif; font-size:x-small; font-weight:bold;}
.clearfix{font-family:Verdana,sans-serif; font-size:xx-small; }
.date{ font-family:Verdana,Arial,Helvetica,sans-serif ; font-size:xx-small;color:#7A7A7A;}
.byline{ font-family:Verdana,Arial,Helvetica,sans-serif ; font-size:xx-small;color:#666666;}
.headline{font-family:Verdana,Arial,Helvetica,sans-serif ; font-size:large; font-weight:bold;}
'''
feeds = [
('Top Headlines', 'https://www.espn.com/espn/rss/news'),
('NFL', 'https://www.espn.com/espn/rss/nfl/news'),
('NBA', 'https://www.espn.com/espn/rss/nba/news'),
('MLB', 'https://www.espn.com/espn/rss/mlb/news'),
('NHL', 'https://www.espn.com/espn/rss/nhl/news'),
('Golf', 'https://www.espn.com/espn/rss/golf/news'),
('RPM', 'https://www.espn.com/espn/rss/rpm/news'),
('Boxing', 'https://www.espn.com/espn/rss/boxing/news'),
('Soccer', 'https://www.espn.com/espn/rss/soccer/news'),
('NCB', 'https://www.espn.com/espn/rss/ncb/news'),
('NCF', 'https://www.espn.com/espn/rss/ncf/news'),
('NCAA', 'https://www.espn.com/espn/rss/ncaa/news'),
('Olympics', 'https://www.espn.com/espn/rss/oly/news'),
('Equestrian', 'https://www.espn.com/espn/rss/horse/news'),
]
def preprocess_html(self, soup):
for div in soup.findAll('div', style=True):
if 'px' in div['style']:
div['style'] = ''
return soup
def postprocess_html(self, soup, first_fetch):
for div in soup.findAll('div', style=True):
div['style'] = div['style'].replace('center', 'left')
return soup
Last edited by kovidgoyal; 08-07-2020 at 03:49 AM. |
|
|
|
|
|
#12 |
|
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
OK, I'm starting to understand how this stuff works. I think I'm making progress, but I'm not sure.
The base URL has changed. feeds = [ ('Top Headlines', 'http://sports.espn.go.com/espn/rss/news'), 'http://sports.espn.go.com/espn/rss/nfl/news', 'http://sports.espn.go.com/espn/rss/nba/news', 'http://sports.espn.go.com/espn/rss/mlb/news', 'http://sports.espn.go.com/espn/rss/nhl/news', 'http://sports.espn.go.com/espn/rss/golf/news', 'http://sports.espn.go.com/espn/rss/rpm/news', 'http://sports.espn.go.com/espn/rss/tennis/news', 'http://sports.espn.go.com/espn/rss/boxing/news', 'http://soccernet.espn.go.com/rss/news', 'http://sports.espn.go.com/espn/rss/ncb/news', 'http://sports.espn.go.com/espn/rss/ncf/news', 'http://sports.espn.go.com/espn/rss/ncaa/news', 'http://sports.espn.go.com/espn/rss/outdoors/news', # 'http://sports.espn.go.com/espn/rss/bassmaster/news', 'http://sports.espn.go.com/espn/rss/oly/news', 'http://sports.espn.go.com/espn/rss/horse/news' ] Therefore, in print_version() we need return 'http://sports.espn.go.com/espn/print?' + match.group(1) + '&type=story' However, where I'm getting confused is where we get "match" setup. When we land inside of print_version, the variable "url" is holding the number. For instance, this is a good URL. https://www.espn.com/espn/print?id=29581539&type=story But the 'url' variable is coming in with '29581539', and the 'match' variable is completely empty. My current attempt has this in print_version(), which isn't working. def print_version(self, url): if 'eticket' in url: return url.partition('&')[0].replace('story?', 'print?') match = re.search(r'story\?(id=\d+)', url) self.log.debug('url: %s' % (url)) self.log.debug('match: %s' % (match.group(1))) match = 1 articleId = url if match and 'soccernet' not in url and 'bassmaster' not in url: # return 'http://sports.espn.go.com/espn/print?' + match.group(1) + '&type=story' self.log.debug('i: %s' % (match.group(1))) # https://www.espn.com/espn/print?id=29581539&type=story # return 'http://www.espn.com/espn/print?' + match.group(1) + '&type=story' I'll keep applying head to wall, but if this helps someone else get closer, that's good. |
|
|
|
|
|
#13 |
|
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
Sigh, that was completely wrong. Here is the correct information.
------------- OK, I'm starting to understand how this stuff works. I think I'm making progress, but I'm not sure. The base URL has changed. feeds = [ ('Top Headlines', 'https://www.espn.com/espn/rss/news'), 'https://www.espn.com/espn/rss/nfl/news', 'https://www.espn.com/espn/rss/nba/news', 'https://www.espn.com/espn/rss/mlb/news', 'https://www.espn.com/espn/rss/nhl/news', 'https://www.espn.com/espn/rss/golf/news', 'https://www.espn.com/espn/rss/rpm/news', 'https://www.espn.com/espn/rss/tennis/news', 'https://www.espn.com/espn/rss/boxing/news', 'https://www.espn.com/espn/rss/soccer/news', # 'http://soccernet.espn.go.com/rss/news', 'https://www.espn.com/espn/rss/ncb/news', 'https://www.espn.com/espn/rss/ncf/news', 'https://www.espn.com/espn/rss/ncaa/news', # 'https://www.espn.com/espn/rss/outdoors/news', # 'http://sports.espn.go.com/espn/rss/bassmaster/news', 'https://www.espn.com/espn/rss/oly/news', 'https://www.espn.com/espn/rss/horse/news' ] Therefore, in print_version() we need return 'http://www.espn.com/espn/print?id=' + articleId + '&type=story' However, where I'm getting confused is where we get "match" setup. When we land inside of print_version, the variable "url" is holding the number. For instance, this is a good URL. https://www.espn.com/espn/print?id=29581539&type=story But the 'url' variable is coming in with '29581539', and the 'match' variable is completely empty. My current attempt has this in print_version(), which isn't working. def print_version(self, url): if 'eticket' in url: return url.partition('&')[0].replace('story?', 'print?') match = re.search(r'story\?(id=\d+)', url) self.log.debug('url: %s' % (url)) self.log.debug('match: %s' % (match.group(1))) match = 1 articleId = url if match and 'soccernet' not in url and 'bassmaster' not in url: # return 'http://sports.espn.go.com/espn/print?' + match.group(1) + '&type=story' self.log.debug('i: %s' % (match.group(1))) # https://www.espn.com/espn/print?id=29581539&type=story # return 'http://www.espn.com/espn/print?' + match.group(1) + '&type=story' return 'http://www.espn.com/espn/print?id=' + articleId + '&type=story' I'll keep applying head to wall, but if this helps someone else get closer, that's good. |
|
|
|
|
|
#14 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,618
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There you go: https://github.com/kovidgoyal/calibr...1b1879f8d2d13f
|
|
|
|
|
|
#15 |
|
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
I was out of town for a week, and I'm just getting back to this.
This works perfectly, thank you! I have imported it to ESPN_master, and it works great. Thank you again, Rob |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| WSJ recipe fails | mjfriedman | Recipes | 13 | 10-17-2019 03:09 PM |
| Newsweek recipe now fails | NSILMike | Recipes | 6 | 08-02-2017 07:40 PM |
| ESPN recipe broken due to new print urls | Odyseus | Recipes | 1 | 01-18-2012 01:23 AM |
| Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 05:57 AM |
| ESPN Recipe is no longer carrying Soccernet | rylsfan | Recipes | 2 | 02-24-2011 11:33 AM |