Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 04-17-2015, 01:18 PM   #1
zachlapidus
Junior Member
zachlapidus began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2015
Device: Kindle Keyboard
Fixed the Wired Magazine Recipe (not daily)...kind of

Hi, first time poster. Thanks to Kovid, the community, and everyone for all this amazing work!

I've used calibre since I've got my kindle and it's been amazing. One of my absolute favorites from around 2011 when I started was the Wired Magazine feed -- at that time it was primarily long, detailed articles from the print edition.

I recently started using calibre again and was disappointed to see that the Wired recipe is currently broken, and appears to not have worked for quite some time. The Wired Daily Edition recipe is working, but seems to pull a daily digest the latest posts, which are more short news stories, with the occasional longer article.

I am not really a python programmer at all, but I read a little of the API documentation and I made a hacky modification to the Wired Daily script that only pulls articles with the "Magazine" tag from page 1 and 2 from here: http://wired.com/category/magazine/page/1. I'm sure someone more experienced than me can make a better version, but I don't think it's that bad for a first go-round.

Hope this is okay to post here.

Code:
__license__   = 'GPL v3'
__copyright__ = '2014, Darko Miletic <darko.miletic at gmail.com>'
'''
www.wired.com
'''

from calibre.web.feeds.news import BasicNewsRecipe
from calibre import strftime

class WiredDailyNews(BasicNewsRecipe):
    title                 = 'Wired Magazine, Monthly Edition'
    __author__            = 'Darko Miletic, update by Zach Lapidus'
    description           = ('Wired is a full-color monthly American magazine, published in both print '
                             'and online editions, that reports on how emerging technologies affect culture,'
                             'the economy and politics.')
    publisher             = 'Conde Nast'
    category              = 'news, IT, computers, technology'
    oldest_article        = 2
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf-8'
    use_embedded_content  = False
    language              = 'en'
    ignore_duplicate_articles = {'url'}
    remove_empty_feeds    = True
    publication_type      = 'newsportal'
    extra_css             = """
                            .entry-header{
                                          text-transform: uppercase;
                                          vertical-align: baseline;
                                          display: inline;
                                         }
                            ul li{display: inline}
                            """

    remove_tags = [
        dict(name=['meta','link']),
        dict(name='div', attrs={'class':'podcast_storyboard'}),
        dict(id=['sharing', 'social', 'article-tags', 'sidebar']),
                  ]
    keep_only_tags=[
        dict(attrs={'data-js':['post', 'postHeader']}),
    ]
    
    def parse_index(self):
        totalfeeds = []
        #first page 1
        soup   = self.index_to_soup('http://www.wired.com/category/magazine/page/1')
        majorf = soup.find('main')
        articles = []
        checker = []
        if majorf:
           for a in majorf.findAll('a', href=True):
               if a['href'].startswith('http://www.wired.com/') and a['href'].endswith('/'):
                  #title = self.tag_to_string(a)
                  
                  titleloc = a.find('h2')
                  title = self.tag_to_string(titleloc)
                  url = a['href']
                  dateloc = a.find('time')
                  date = self.tag_to_string(dateloc)
                  
                  if title.lower() != 'read more' and title and url not in checker:
                      checker.append(url) 
                      articles.append({
                                          'title'      :title
                                         ,'date'       :date
                                         ,'url'        :a['href']
                                         ,'description':''
                                        })
           totalfeeds.append(('Articles', articles))
        # then do page 2   
        soup   = self.index_to_soup('http://www.wired.com/category/magazine/page/2')
        majorf = soup.find('main')
        if majorf:
           for a in majorf.findAll('a', href=True):
               if a['href'].startswith('http://www.wired.com/') and a['href'].endswith('/'):
                  #title = self.tag_to_string(a)
                  
                  titleloc = a.find('h2')
                  title = self.tag_to_string(titleloc)
                  url = a['href']
                  dateloc = a.find('time')
                  date = self.tag_to_string(dateloc)
                  
                  if title.lower() != 'read more' and title and url not in checker:
                      checker.append(url) 
                      articles.append({
                                          'title'      :title
                                         ,'date'       :date
                                         ,'url'        :a['href']
                                         ,'description':''
                                        })
           totalfeeds.append(('Articles', articles))
        return totalfeeds



    def get_article_url(self, article):
        return article.get('guid',  None)
zachlapidus is offline   Reply With Quote
Old 04-17-2015, 10:43 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Cool, I cleaned it up and updated the builtin recipe: https://github.com/kovidgoyal/calibr...58d2d2edf788c9
kovidgoyal is offline   Reply With Quote
Advert
Old 05-01-2015, 05:40 PM   #3
truth1ness
Zealot
truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!truth1ness is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!
 
Posts: 126
Karma: 50000
Join Date: Mar 2015
Device: none
Thanks. I found there is a month specific url structure for each month's magazine: http://www.wired.com/tag/magazine-23-05/page/1

I edited the recipe so that it will dynamically find the current month's URL based on the current date and will crawl through the pages until there are no more so you can get exactly the articles from the current month's issue with no overlap/missed items.

https://github.com/kovidgoyal/calibre/pull/394/files
truth1ness is offline   Reply With Quote
Old 05-15-2015, 11:35 PM   #4
zachlapidus
Junior Member
zachlapidus began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2015
Device: Kindle Keyboard
Thanks truth1ness! What a great update. I just ran the script and had to make a slight tweak to get it to fetch the "long read" article about the silk road -- had to add something to the keep_only_tags to get the article to actually download, as it is (sigh) in a different format:

Code:
__license__   = 'GPL v3'
__copyright__ = '2014, Darko Miletic <darko.miletic at gmail.com>'
'''
www.wired.com
'''

from calibre.web.feeds.news import BasicNewsRecipe
from datetime import date
import urllib2

class WiredDailyNews(BasicNewsRecipe):
    title                 = 'Wired Magazine, Monthly Edition'
    __author__            = 'Darko Miletic, update by Zach Lapidus, Michael Marotta'
    description           = ('Wired is a full-color monthly American magazine, published in both print '
                             'and online editions, that reports on how emerging technologies affect culture,'
                             'the economy and politics. Monthly edition, best run at the start of every month.')
    publisher             = 'Conde Nast'
    category              = 'news, IT, computers, technology'
    oldest_article        = 2
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf-8'
    use_embedded_content  = False
    language              = 'en'
    ignore_duplicate_articles = {'url'}
    remove_empty_feeds    = True
    publication_type      = 'newsportal'
    extra_css             = """
                            .entry-header{
                                          text-transform: uppercase;
                                          vertical-align: baseline;
                                          display: inline;
                                         }
                            """

    remove_tags = [
        dict(name=['meta','link']),
        dict(name='div', attrs={'class':'podcast_storyboard'}),
        dict(id=['sharing', 'social', 'article-tags', 'sidebar']),
                  ]
    keep_only_tags=[
        dict(attrs={'data-js':['post', 'postHeader']}),
        dict(attrs={'class':'exchange fsb-content relative'})
    ]

    def get_date_url(self):
        '''
        get month and year, add year modifier, append to wired magazine url,
        :return: url
        '''
        baseurl = 'http://www.wired.com/tag/magazine-'
        monthurl = str('{:02d}'.format(date.today().month))
        yearurl = str(date.today().year - 1992)
        dateurl = baseurl + yearurl + '-' + monthurl + '/page/'
        return dateurl

    def parse_wired_index_page(self, currenturl, seen):
        soup   = self.index_to_soup(currenturl)
        for a in soup.find('main').findAll('a', href=True):
            url = a['href']
            if url.startswith('http://www.wired.com/') and url.endswith('/'):
                title = self.tag_to_string(a.find('h2'))
                dateloc = a.find('time')
                date = self.tag_to_string(dateloc)
                if title.lower() != 'read more' and title and url not in seen:
                    seen.add(url)
                    self.log('Found article:', title)
                    yield {'title':title, 'date':date, 'url':url, 'description':''}

    def parse_index(self):
        '''
        get the current month's url, index first page to soup, find number of pages,
        just keep adding to page num until soup is not none instead of scraping page for
        :return:
        '''
        baseurl = self.get_date_url()
        pagenum = 1
        articles = []
        seen = set()
        morepages = True
        while morepages:
            try:
                urllib2.urlopen(baseurl + str(pagenum))
                currenturl = baseurl + str(pagenum)
                articles.extend(self.parse_wired_index_page(currenturl, seen))
                pagenum += 1
            except urllib2.HTTPError:
                morepages = False
        return [('Articles', articles)]
Only one problem -- the svg images used in the article appear as black rectangles in both Calibre's viewer, and on the kindle. I tried fetching the news source as an ePub instead, thinking that that format did not rasterize svg, but got the black squares again. Any ideas?
zachlapidus is offline   Reply With Quote
Old 07-26-2015, 05:29 PM   #5
SunLight
Connoisseur
SunLight began at the beginning.
 
Posts: 99
Karma: 36
Join Date: Jun 2010
Device: none
None of these seem to work:

calibre, version 2.33.0 (win32, isfrozen: True)
Conversion Error: Failed: Fetch news from Wired Magazine, Monthly Edition

Fetch news from Wired Magazine, Monthly Edition
Resolved conversion options
calibre version: 2.33.0
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'change_justification': 'original',
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_download_recipe': False,
'dont_split_on_page_breaks': True,
'duplicate_links_in_toc': False,
'embed_all_fonts': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'epub_inline_toc': False,
'epub_toc_at_end': False,
'expand_css': False,
'extra_css': None,
'extract_to': None,
'filter_css': None,
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000004E55400>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.GenericEink object at 0x0000000004E55780>,
'page_breaks_before': None,
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'search_replace': None,
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': '',
'sr1_search': '',
'sr2_replace': '',
'sr2_search': '',
'sr3_replace': '',
'sr3_search': '',
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'test': False,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: Recipe Input running
Using custom recipe
Synthesizing mastheadImage
Python function terminated unexpectedly
All feeds are empty, aborting. (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 193, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1042, in run
File "site-packages\calibre\customize\conversion.py", line 241, in __call__
File "site-packages\calibre\ebooks\conversion\plugins\recipe_ input.py", line 117, in convert
File "site-packages\calibre\web\feeds\news.py", line 1029, in download
File "site-packages\calibre\web\feeds\news.py", line 1281, in build_index
File "site-packages\calibre\web\feeds\news.py", line 1559, in create_opf
Exception: All feeds are empty, aborting.
SunLight is offline   Reply With Quote
Advert
Old 07-26-2015, 07:27 PM   #6
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,117
Karma: 73448614
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
Wired Daily Edition works; but both Wired Magazine, Monthly Edition and Wired Magazine - UK Edition fail.
PeterT is offline   Reply With Quote
Old 07-26-2015, 08:40 PM   #7
SunLight
Connoisseur
SunLight began at the beginning.
 
Posts: 99
Karma: 36
Join Date: Jun 2010
Device: none
Quote:
Originally Posted by PeterT View Post
Wired Daily Edition works; but both Wired Magazine, Monthly Edition and Wired Magazine - UK Edition fail.

That's what I am seeing too
SunLight is offline   Reply With Quote
Old 07-11-2018, 02:45 PM   #8
olimyob
Junior Member
olimyob began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jun 2011
Device: ipad
Digging this one back up..... There is no box to enter paywall information. Thank You!
olimyob is offline   Reply With Quote
Old 07-11-2018, 09:14 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Somebody would need to edit the recipe to support logging into the paywall for that.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
wired daily edition recipe is broken NSILMike Recipes 1 03-15-2015 11:26 AM
Wired Magazine (US) kolosus Recipes 1 09-16-2012 10:38 PM
wired magazine uk scissors Recipes 0 04-29-2012 08:45 AM
Wired Magazine UK New recipe Starson17 Recipes 2 07-19-2011 12:23 PM
Wired Daily Phoul Recipes 4 01-12-2011 12:35 PM


All times are GMT -4. The time now is 05:13 AM.


MobileRead.com is a privately owned, operated and funded community.