04-17-2015, 01:18 PM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Apr 2015
Device: Kindle Keyboard
|
Fixed the Wired Magazine Recipe (not daily)...kind of
Hi, first time poster. Thanks to Kovid, the community, and everyone for all this amazing work!
I've used calibre since I've got my kindle and it's been amazing. One of my absolute favorites from around 2011 when I started was the Wired Magazine feed -- at that time it was primarily long, detailed articles from the print edition. I recently started using calibre again and was disappointed to see that the Wired recipe is currently broken, and appears to not have worked for quite some time. The Wired Daily Edition recipe is working, but seems to pull a daily digest the latest posts, which are more short news stories, with the occasional longer article. I am not really a python programmer at all, but I read a little of the API documentation and I made a hacky modification to the Wired Daily script that only pulls articles with the "Magazine" tag from page 1 and 2 from here: http://wired.com/category/magazine/page/1. I'm sure someone more experienced than me can make a better version, but I don't think it's that bad for a first go-round. Hope this is okay to post here. Code:
__license__ = 'GPL v3' __copyright__ = '2014, Darko Miletic <darko.miletic at gmail.com>' ''' www.wired.com ''' from calibre.web.feeds.news import BasicNewsRecipe from calibre import strftime class WiredDailyNews(BasicNewsRecipe): title = 'Wired Magazine, Monthly Edition' __author__ = 'Darko Miletic, update by Zach Lapidus' description = ('Wired is a full-color monthly American magazine, published in both print ' 'and online editions, that reports on how emerging technologies affect culture,' 'the economy and politics.') publisher = 'Conde Nast' category = 'news, IT, computers, technology' oldest_article = 2 max_articles_per_feed = 200 no_stylesheets = True encoding = 'utf-8' use_embedded_content = False language = 'en' ignore_duplicate_articles = {'url'} remove_empty_feeds = True publication_type = 'newsportal' extra_css = """ .entry-header{ text-transform: uppercase; vertical-align: baseline; display: inline; } ul li{display: inline} """ remove_tags = [ dict(name=['meta','link']), dict(name='div', attrs={'class':'podcast_storyboard'}), dict(id=['sharing', 'social', 'article-tags', 'sidebar']), ] keep_only_tags=[ dict(attrs={'data-js':['post', 'postHeader']}), ] def parse_index(self): totalfeeds = [] #first page 1 soup = self.index_to_soup('http://www.wired.com/category/magazine/page/1') majorf = soup.find('main') articles = [] checker = [] if majorf: for a in majorf.findAll('a', href=True): if a['href'].startswith('http://www.wired.com/') and a['href'].endswith('/'): #title = self.tag_to_string(a) titleloc = a.find('h2') title = self.tag_to_string(titleloc) url = a['href'] dateloc = a.find('time') date = self.tag_to_string(dateloc) if title.lower() != 'read more' and title and url not in checker: checker.append(url) articles.append({ 'title' :title ,'date' :date ,'url' :a['href'] ,'description':'' }) totalfeeds.append(('Articles', articles)) # then do page 2 soup = self.index_to_soup('http://www.wired.com/category/magazine/page/2') majorf = soup.find('main') if majorf: for a in majorf.findAll('a', href=True): if a['href'].startswith('http://www.wired.com/') and a['href'].endswith('/'): #title = self.tag_to_string(a) titleloc = a.find('h2') title = self.tag_to_string(titleloc) url = a['href'] dateloc = a.find('time') date = self.tag_to_string(dateloc) if title.lower() != 'read more' and title and url not in checker: checker.append(url) articles.append({ 'title' :title ,'date' :date ,'url' :a['href'] ,'description':'' }) totalfeeds.append(('Articles', articles)) return totalfeeds def get_article_url(self, article): return article.get('guid', None) |
04-17-2015, 10:43 PM | #2 |
creator of calibre
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Cool, I cleaned it up and updated the builtin recipe: https://github.com/kovidgoyal/calibr...58d2d2edf788c9
|
Advert | |
|
05-01-2015, 05:40 PM | #3 |
Zealot
Posts: 126
Karma: 50000
Join Date: Mar 2015
Device: none
|
Thanks. I found there is a month specific url structure for each month's magazine: http://www.wired.com/tag/magazine-23-05/page/1
I edited the recipe so that it will dynamically find the current month's URL based on the current date and will crawl through the pages until there are no more so you can get exactly the articles from the current month's issue with no overlap/missed items. https://github.com/kovidgoyal/calibre/pull/394/files |
05-15-2015, 11:35 PM | #4 |
Junior Member
Posts: 2
Karma: 10
Join Date: Apr 2015
Device: Kindle Keyboard
|
Thanks truth1ness! What a great update. I just ran the script and had to make a slight tweak to get it to fetch the "long read" article about the silk road -- had to add something to the keep_only_tags to get the article to actually download, as it is (sigh) in a different format:
Code:
__license__ = 'GPL v3' __copyright__ = '2014, Darko Miletic <darko.miletic at gmail.com>' ''' www.wired.com ''' from calibre.web.feeds.news import BasicNewsRecipe from datetime import date import urllib2 class WiredDailyNews(BasicNewsRecipe): title = 'Wired Magazine, Monthly Edition' __author__ = 'Darko Miletic, update by Zach Lapidus, Michael Marotta' description = ('Wired is a full-color monthly American magazine, published in both print ' 'and online editions, that reports on how emerging technologies affect culture,' 'the economy and politics. Monthly edition, best run at the start of every month.') publisher = 'Conde Nast' category = 'news, IT, computers, technology' oldest_article = 2 max_articles_per_feed = 200 no_stylesheets = True encoding = 'utf-8' use_embedded_content = False language = 'en' ignore_duplicate_articles = {'url'} remove_empty_feeds = True publication_type = 'newsportal' extra_css = """ .entry-header{ text-transform: uppercase; vertical-align: baseline; display: inline; } """ remove_tags = [ dict(name=['meta','link']), dict(name='div', attrs={'class':'podcast_storyboard'}), dict(id=['sharing', 'social', 'article-tags', 'sidebar']), ] keep_only_tags=[ dict(attrs={'data-js':['post', 'postHeader']}), dict(attrs={'class':'exchange fsb-content relative'}) ] def get_date_url(self): ''' get month and year, add year modifier, append to wired magazine url, :return: url ''' baseurl = 'http://www.wired.com/tag/magazine-' monthurl = str('{:02d}'.format(date.today().month)) yearurl = str(date.today().year - 1992) dateurl = baseurl + yearurl + '-' + monthurl + '/page/' return dateurl def parse_wired_index_page(self, currenturl, seen): soup = self.index_to_soup(currenturl) for a in soup.find('main').findAll('a', href=True): url = a['href'] if url.startswith('http://www.wired.com/') and url.endswith('/'): title = self.tag_to_string(a.find('h2')) dateloc = a.find('time') date = self.tag_to_string(dateloc) if title.lower() != 'read more' and title and url not in seen: seen.add(url) self.log('Found article:', title) yield {'title':title, 'date':date, 'url':url, 'description':''} def parse_index(self): ''' get the current month's url, index first page to soup, find number of pages, just keep adding to page num until soup is not none instead of scraping page for :return: ''' baseurl = self.get_date_url() pagenum = 1 articles = [] seen = set() morepages = True while morepages: try: urllib2.urlopen(baseurl + str(pagenum)) currenturl = baseurl + str(pagenum) articles.extend(self.parse_wired_index_page(currenturl, seen)) pagenum += 1 except urllib2.HTTPError: morepages = False return [('Articles', articles)] |
07-26-2015, 05:29 PM | #5 |
Connoisseur
Posts: 99
Karma: 36
Join Date: Jun 2010
Device: none
|
None of these seem to work:
calibre, version 2.33.0 (win32, isfrozen: True) Conversion Error: Failed: Fetch news from Wired Magazine, Monthly Edition Fetch news from Wired Magazine, Monthly Edition Resolved conversion options calibre version: 2.33.0 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_split_on_page_breaks': True, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'epub_flatten': False, 'epub_inline_toc': False, 'epub_toc_at_end': False, 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000004E55400>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.GenericEink object at 0x0000000004E55780>, 'page_breaks_before': None, 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'search_replace': None, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: Recipe Input running Using custom recipe Synthesizing mastheadImage Python function terminated unexpectedly All feeds are empty, aborting. (Error Code: 1) Traceback (most recent call last): File "site.py", line 132, in main File "site.py", line 109, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 193, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 1042, in run File "site-packages\calibre\customize\conversion.py", line 241, in __call__ File "site-packages\calibre\ebooks\conversion\plugins\recipe_ input.py", line 117, in convert File "site-packages\calibre\web\feeds\news.py", line 1029, in download File "site-packages\calibre\web\feeds\news.py", line 1281, in build_index File "site-packages\calibre\web\feeds\news.py", line 1559, in create_opf Exception: All feeds are empty, aborting. |
Advert | |
|
07-26-2015, 07:27 PM | #6 |
Grand Sorcerer
Posts: 12,117
Karma: 73448614
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
Wired Daily Edition works; but both Wired Magazine, Monthly Edition and Wired Magazine - UK Edition fail.
|
07-26-2015, 08:40 PM | #7 |
Connoisseur
Posts: 99
Karma: 36
Join Date: Jun 2010
Device: none
|
|
07-11-2018, 02:45 PM | #8 |
Junior Member
Posts: 3
Karma: 10
Join Date: Jun 2011
Device: ipad
|
Digging this one back up..... There is no box to enter paywall information. Thank You!
|
07-11-2018, 09:14 PM | #9 |
creator of calibre
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Somebody would need to edit the recipe to support logging into the paywall for that.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
wired daily edition recipe is broken | NSILMike | Recipes | 1 | 03-15-2015 11:26 AM |
Wired Magazine (US) | kolosus | Recipes | 1 | 09-16-2012 10:38 PM |
wired magazine uk | scissors | Recipes | 0 | 04-29-2012 08:45 AM |
Wired Magazine UK New recipe | Starson17 | Recipes | 2 | 07-19-2011 12:23 PM |
Wired Daily | Phoul | Recipes | 4 | 01-12-2011 12:35 PM |