Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-14-2011, 12:58 PM   #1
schuster
Zealot
schuster doesn't litterschuster doesn't litter
 
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
recipe for Technology Review - german

Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class AdvancedUserRecipe1303841067(BasicNewsRecipe):

    title          = u'Technology Review'
    __author__  = 'schuster'
    remove_tags_before = dict(id='keywords')
    remove_tags_after  = dict(id='kommentar')
    remove_tags = [dict(attrs={'class':['navi_oben_pvg', 'navi_oben_tarifr', 'navi_oben_itm', 'navi_oben_eve', 'navi_oben_whi', 'navi_oben_abo', 'navi_oben_shop', 'navi_top_logo', 'navi_top_abschnitt', 'first']}),
               dict(id=['footer', 'toolsRight', 'articleInline', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']),
               dict(name=['script', 'noscript', 'style'])]
    oldest_article = 4
    max_articles_per_feed = 100
    no_stylesheets         = True
    use_embedded_content   = False
    language               = 'de'
    remove_javascript      = True
 
    def print_version(self, url):
        return url  + '?view=print'


    feeds          = [
    (u'Technik News', u'http://www.heise.de/tr/news-atom.xml') ]
schuster is offline   Reply With Quote
Old 06-05-2016, 07:17 AM   #2
Aimylios
Member
Aimylios began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Apr 2016
Device: Tolino Vision 3HD
Hi,

Calibre currently includes two recipes for the German edition of Technology Review, the technology_review_de.recipe (i.e. the one posted above by schuster) and the tr.recipe. Both don't work very well, especially after the latest changes in site layout.
I merged them, improved the code to correctly handle the new formatting and added a function to grab the magazine cover. As I don't see any sense in having two recipes for exactly the same news source, I would recommend to update one of them based on the code below and delete the other.

Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function

__license__   = 'GPL v3'
__copyright__ = '2010, Anton Gillert <atx at binaryninja.de>'

'''
Technology Review (deutsch) - heise.de/tr
'''

import re
from calibre.web.feeds.news import BasicNewsRecipe

class TechnologyReviewDe(BasicNewsRecipe):
    title       = 'Technology Review'
    __author__  = 'Anton Gillert, schuster'
    description = 'Technology news from Germany'
    language    = 'de'

    oldest_article        = 14
    max_articles_per_feed = 50
    use_embedded_content  = False
    no_stylesheets        = True
    remove_javascript     = True

    masthead_url = 'http://1.f.ix.de/imgs/02/3/0/8/5/2/8/tr_logo-544bd18881c81263.png'

    feeds = [
        ('News', 'http://www.heise.de/tr/rss/news-atom.xml'),
        ('Blog', 'http://www.heise.de/tr/rss/blog-atom.xml')
    ]

    keep_only_tags = [
        dict(name='article')
    ]

    remove_tags = [
        dict(name='nav'),
        dict(name='figure', attrs={'class':'logo'}),
        dict(name='hr')
    ]

    extra_css = '.bild_zentriert {font-size: 0.6em} \
                 .source {font-size: 0.6em}'

    def get_cover_url(self):
        self.cover_url = ''
        soup = self.index_to_soup('http://www.heise.de/tr/magazin/')
        img = soup.find('img', alt=re.compile('Titelbild Technology Review'), src=True)
        if img:
            self.cover_url = 'http://www.heise.de' + img['src']
        return self.cover_url

    def print_version(self, url):
        return url + '?view=print'

    def preprocess_html(self, soup):
        # remove style attributes
        for item in soup.findAll(attrs={'style':True}):
            del item['style']
        # remove reference to article source
        for p in soup.findAll('p'):
            if 'URL dieses Artikels:' in self.tag_to_string(p):
                p.extract()
        return soup
Aimylios is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Technology Review (United States) Updated bcollier Recipes 1 10-25-2013 10:44 AM
recipe request(Pitchfork Review) ubieubie Recipes 0 04-18-2011 04:19 PM
Entourage review from Invention & Technology News andrys News 0 05-16-2010 08:31 AM
txtr reader vorgestellt in Technology Review 03/09 Alexander Turcic Andere Lesegeräte 9 03-19-2009 10:16 AM
Sony Reader reviewed by MIT Technology Review Bob Russell Sony Reader 38 11-09-2006 05:04 PM


All times are GMT -4. The time now is 04:46 PM.


MobileRead.com is a privately owned, operated and funded community.