12-10-2018, 11:12 AM | #1 |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
ESPN recipe fails
(the one by Kovid and Raman)
Trying to get latest version of recipe: espn Python function terminated unexpectedly HTTP Error 401: Unauthorized (Error Code: 1) Traceback (most recent call last): File "site.py", line 101, in main File "site.py", line 78, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 199, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 35, in gui_convert_recipe File "site-packages\calibre\gui2\convert\gui_conversion.py", line 27, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 1106, in run File "site-packages\calibre\customize\conversion.py", line 244, in __call__ File "site-packages\calibre\ebooks\conversion\plugins\recipe_ input.py", line 135, in convert File "site-packages\calibre\web\feeds\news.py", line 901, in __init__ File "<string>", line 82, in get_browser File "site-packages\mechanize\_mechanize.py", line 254, in open File "site-packages\mechanize\_mechanize.py", line 310, in _mech_open mechanize._response.httperror_seek_wrapper: HTTP Error 401: Unauthorized |
12-11-2018, 12:18 AM | #2 |
creator of calibre
Posts: 43,859
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I need ESPN account credentials to look at that.
|
Advert | |
|
12-11-2018, 07:32 AM | #3 |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
You can create a login, or log in with Facebook. And if you don't have a login how did you originally create the recipe?
Last edited by NSILMike; 12-13-2018 at 10:05 AM. |
12-20-2018, 11:06 AM | #4 |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
|
12-21-2018, 09:49 AM | #5 |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
Just downloaded Calibre 3.36 which says ESPN recipe is improved. Now it doesn't fail, but it downloads only links...
|
Advert | |
|
12-21-2018, 12:14 PM | #6 |
creator of calibre
Posts: 43,859
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yeah I looked at it briefly, ESPN uses a complicated javascript based mechanism to login,which I dont have the time/interest to reverse engineer.
|
12-21-2018, 12:18 PM | #7 |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
|
08-06-2020, 10:45 PM | #8 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
Quote:
I don't know anything about calibre, but I did find some information that I think might be of help. In the file ./resources/builtin_recipes.zip I found a file called espn.recipe. Looking at it, and looking at the web site, I tried this web page: http://sports.espn.go.com/espn/rss/nfl/news which gave me a bunch of stuff, including a URL that looked like this: https://www.espn.com/nfl/story/_/id/...king-full-list I saw on line 109 something that looked interesting, so I tried to go to this page to get the story. http://sports.espn.go.com/espn/print?id=29533526 which seems to work pretty well. Looking at line 115, I saw that this sort of an URL was an interesting idea. https://www.espn.com/espn/print?id=29533526&type=story And that one works as well. Maybe this is all that needs to be changed? Thanks, Rob |
|
08-06-2020, 11:00 PM | #9 |
Junior Member
Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
Well, I don't know how to know if it is working or not. It is not downloading because of age issues that I don't understand.
I am getting this message a lot: Skipping article Bubbles are working for other sports. Why did the NFL decide against one? (Tue, 28 Jul, 2020 11:04) from feed www.espn.com - NFL as it is too old. I'll keep poking around for a cache somewhere. Thanks, Rob |
08-06-2020, 11:12 PM | #10 |
creator of calibre
Posts: 43,859
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Set oldest_article in the recipe to control that. And you dont need to look in builtin_recipes.zip to edit recipes, calibre has UI for that. https://manual.calibre-ebook.com/news.html
|
08-07-2020, 02:10 AM | #11 |
Junior Member
Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
Thank you!
That was exactly where I needed to start. I copied some things from the other espn script, and other things I don't know what they do enough to copy them over and understand what is going on. Here's my script for now, in case anyone else wants to use it. Code:
#!/usr/bin/env python2 # vim:fileencoding=utf-8 from __future__ import unicode_literals, division, absolute_import, print_function from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1596778396(BasicNewsRecipe): title = 'espn_modified' description = 'Sports news' __author__ = 'Rob Walker' language = 'en' no_stylesheets = True use_embedded_content = False remove_javascript = True encoding = 'ISO-8859-1' oldest_article = 7 max_articles_per_feed = 100 auto_cleanup = True remove_tags_before = dict(name='font', attrs={'class': 'date'}) remove_tags = [ dict(name='font', attrs={'class': 'footer'}), dict( name='hr', noshade='noshade'), dict(name='img', src='/winnercomm/horseracing/DRF.jpg') ] extra_css = ''' body{font-family:Verdana,Arial,Helvetica,sans-serif; font-size:x-small; font-weight:normal;} .subhead{color:#666666;font-family:Verdana,sans-serif; font-size:x-small; font-weight:bold;} .clearfix{font-family:Verdana,sans-serif; font-size:xx-small; } .date{ font-family:Verdana,Arial,Helvetica,sans-serif ; font-size:xx-small;color:#7A7A7A;} .byline{ font-family:Verdana,Arial,Helvetica,sans-serif ; font-size:xx-small;color:#666666;} .headline{font-family:Verdana,Arial,Helvetica,sans-serif ; font-size:large; font-weight:bold;} ''' feeds = [ ('Top Headlines', 'https://www.espn.com/espn/rss/news'), ('NFL', 'https://www.espn.com/espn/rss/nfl/news'), ('NBA', 'https://www.espn.com/espn/rss/nba/news'), ('MLB', 'https://www.espn.com/espn/rss/mlb/news'), ('NHL', 'https://www.espn.com/espn/rss/nhl/news'), ('Golf', 'https://www.espn.com/espn/rss/golf/news'), ('RPM', 'https://www.espn.com/espn/rss/rpm/news'), ('Boxing', 'https://www.espn.com/espn/rss/boxing/news'), ('Soccer', 'https://www.espn.com/espn/rss/soccer/news'), ('NCB', 'https://www.espn.com/espn/rss/ncb/news'), ('NCF', 'https://www.espn.com/espn/rss/ncf/news'), ('NCAA', 'https://www.espn.com/espn/rss/ncaa/news'), ('Olympics', 'https://www.espn.com/espn/rss/oly/news'), ('Equestrian', 'https://www.espn.com/espn/rss/horse/news'), ] def preprocess_html(self, soup): for div in soup.findAll('div', style=True): if 'px' in div['style']: div['style'] = '' return soup def postprocess_html(self, soup, first_fetch): for div in soup.findAll('div', style=True): div['style'] = div['style'].replace('center', 'left') return soup Last edited by kovidgoyal; 08-07-2020 at 02:49 AM. |
08-07-2020, 02:02 PM | #12 |
Junior Member
Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
OK, I'm starting to understand how this stuff works. I think I'm making progress, but I'm not sure.
The base URL has changed. feeds = [ ('Top Headlines', 'http://sports.espn.go.com/espn/rss/news'), 'http://sports.espn.go.com/espn/rss/nfl/news', 'http://sports.espn.go.com/espn/rss/nba/news', 'http://sports.espn.go.com/espn/rss/mlb/news', 'http://sports.espn.go.com/espn/rss/nhl/news', 'http://sports.espn.go.com/espn/rss/golf/news', 'http://sports.espn.go.com/espn/rss/rpm/news', 'http://sports.espn.go.com/espn/rss/tennis/news', 'http://sports.espn.go.com/espn/rss/boxing/news', 'http://soccernet.espn.go.com/rss/news', 'http://sports.espn.go.com/espn/rss/ncb/news', 'http://sports.espn.go.com/espn/rss/ncf/news', 'http://sports.espn.go.com/espn/rss/ncaa/news', 'http://sports.espn.go.com/espn/rss/outdoors/news', # 'http://sports.espn.go.com/espn/rss/bassmaster/news', 'http://sports.espn.go.com/espn/rss/oly/news', 'http://sports.espn.go.com/espn/rss/horse/news' ] Therefore, in print_version() we need return 'http://sports.espn.go.com/espn/print?' + match.group(1) + '&type=story' However, where I'm getting confused is where we get "match" setup. When we land inside of print_version, the variable "url" is holding the number. For instance, this is a good URL. https://www.espn.com/espn/print?id=29581539&type=story But the 'url' variable is coming in with '29581539', and the 'match' variable is completely empty. My current attempt has this in print_version(), which isn't working. def print_version(self, url): if 'eticket' in url: return url.partition('&')[0].replace('story?', 'print?') match = re.search(r'story\?(id=\d+)', url) self.log.debug('url: %s' % (url)) self.log.debug('match: %s' % (match.group(1))) match = 1 articleId = url if match and 'soccernet' not in url and 'bassmaster' not in url: # return 'http://sports.espn.go.com/espn/print?' + match.group(1) + '&type=story' self.log.debug('i: %s' % (match.group(1))) # https://www.espn.com/espn/print?id=29581539&type=story # return 'http://www.espn.com/espn/print?' + match.group(1) + '&type=story' I'll keep applying head to wall, but if this helps someone else get closer, that's good. |
08-07-2020, 02:15 PM | #13 |
Junior Member
Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
Sigh, that was completely wrong. Here is the correct information.
------------- OK, I'm starting to understand how this stuff works. I think I'm making progress, but I'm not sure. The base URL has changed. feeds = [ ('Top Headlines', 'https://www.espn.com/espn/rss/news'), 'https://www.espn.com/espn/rss/nfl/news', 'https://www.espn.com/espn/rss/nba/news', 'https://www.espn.com/espn/rss/mlb/news', 'https://www.espn.com/espn/rss/nhl/news', 'https://www.espn.com/espn/rss/golf/news', 'https://www.espn.com/espn/rss/rpm/news', 'https://www.espn.com/espn/rss/tennis/news', 'https://www.espn.com/espn/rss/boxing/news', 'https://www.espn.com/espn/rss/soccer/news', # 'http://soccernet.espn.go.com/rss/news', 'https://www.espn.com/espn/rss/ncb/news', 'https://www.espn.com/espn/rss/ncf/news', 'https://www.espn.com/espn/rss/ncaa/news', # 'https://www.espn.com/espn/rss/outdoors/news', # 'http://sports.espn.go.com/espn/rss/bassmaster/news', 'https://www.espn.com/espn/rss/oly/news', 'https://www.espn.com/espn/rss/horse/news' ] Therefore, in print_version() we need return 'http://www.espn.com/espn/print?id=' + articleId + '&type=story' However, where I'm getting confused is where we get "match" setup. When we land inside of print_version, the variable "url" is holding the number. For instance, this is a good URL. https://www.espn.com/espn/print?id=29581539&type=story But the 'url' variable is coming in with '29581539', and the 'match' variable is completely empty. My current attempt has this in print_version(), which isn't working. def print_version(self, url): if 'eticket' in url: return url.partition('&')[0].replace('story?', 'print?') match = re.search(r'story\?(id=\d+)', url) self.log.debug('url: %s' % (url)) self.log.debug('match: %s' % (match.group(1))) match = 1 articleId = url if match and 'soccernet' not in url and 'bassmaster' not in url: # return 'http://sports.espn.go.com/espn/print?' + match.group(1) + '&type=story' self.log.debug('i: %s' % (match.group(1))) # https://www.espn.com/espn/print?id=29581539&type=story # return 'http://www.espn.com/espn/print?' + match.group(1) + '&type=story' return 'http://www.espn.com/espn/print?id=' + articleId + '&type=story' I'll keep applying head to wall, but if this helps someone else get closer, that's good. |
08-08-2020, 06:33 AM | #14 |
creator of calibre
Posts: 43,859
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There you go: https://github.com/kovidgoyal/calibr...1b1879f8d2d13f
|
08-20-2020, 08:48 PM | #15 |
Junior Member
Posts: 8
Karma: 10
Join Date: Aug 2020
Device: kobo libre h20
|
I was out of town for a week, and I'm just getting back to this.
This works perfectly, thank you! I have imported it to ESPN_master, and it works great. Thank you again, Rob |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
WSJ recipe fails | mjfriedman | Recipes | 13 | 10-17-2019 02:09 PM |
Newsweek recipe now fails | NSILMike | Recipes | 6 | 08-02-2017 06:40 PM |
ESPN recipe broken due to new print urls | Odyseus | Recipes | 1 | 01-18-2012 12:23 AM |
Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 04:57 AM |
ESPN Recipe is no longer carrying Soccernet | rylsfan | Recipes | 2 | 02-24-2011 10:33 AM |