Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-24-2014, 09:32 AM   #1
Raskospoon
Junior Member
Raskospoon began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jan 2012
Device: Kindle4
Deutsche Welle Recipe (en) not working

Hello to everybody,
I'm trying to make Deutsche Welle (english version) recipe working, because the recipe provided isn't working anymore.
I tried to add all rss feeds, organized in sections (in the original recipe there was only one). But the recipe still doesn't work.
I think it might be the piece of code where it asks the print version of the page. Any help?
Thank you very much.
Michele.
Raskospoon is offline   Reply With Quote
Old 05-15-2016, 05:12 AM   #2
Aimylios
Member
Aimylios began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Apr 2016
Device: Tolino Vision 3HD
Hi Michele,

it may be too late for you, but I encountered the same problem and fixed the recipe. Indeed, there is no print version of the articles anymore, and the layout of the pages changed.

I also updated the spanish language version (see below), which basically uses the same code. I guess it should be portable to any of the 28 other language versions of Deutsche Welle.

Deutsche Welle (english)
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function

__license__   = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'

'''
Deutsche Welle (english) - dw.com/en
'''

import re
from calibre.web.feeds.news import BasicNewsRecipe

class DeutscheWelle_en(BasicNewsRecipe):
    title       = 'Deutsche Welle'
    __author__  = 'Darko Miletic'
    description = 'News from Germany and the world'
    publisher   = 'Deutsche Welle'
    language    = 'en'

    oldest_article            = 1
    max_articles_per_feed     = 50
    no_stylesheets            = True
    remove_javascript         = True
    remove_empty_feeds        = True
    ignore_duplicate_articles = {'title', 'url'}

    feeds = [
        ('Top Stories', 'http://rss.dw-world.de/rdf/rss-en-top'),
        ('World', 'http://rss.dw.de/rdf/rss-en-world'),
        ('Germany', 'http://rss.dw.de/rdf/rss-en-ger'),
        ('Europe', 'http://rss.dw.de/rdf/rss-en-eu'),
        ('Business', 'http://rss.dw.de/rdf/rss-en-bus'),
        ('Culture & Lifestyle', 'http://rss.dw.de/rdf/rss-en-cul'),
        ('Sports', 'http://rss.dw.de/rdf/rss-en-sports'),
        ('Visit Germany', 'http://rss.dw.de/rdf/rss-en-visitgermany'),
        ('Asia', 'http://rss.dw.de/rdf/rss-en-asia')
    ]

    keep_only_tags=[
        dict(name='div', attrs={'class':'col3'})
    ]

    remove_tags_after = [
        dict(name='div', attrs={'class':'group'})
    ]

    remove_tags = [
        dict(name='div', attrs={'class':'col1'}),
        dict(name='div', attrs={'class':re.compile('gallery')}),
        dict(name='div', attrs={'class':re.compile('audio')}),
        dict(name='div', attrs={'class':re.compile('video')})
    ]

    remove_attributes = ['height', 'width', 'onclick', 'border', 'lang', 'link']

    extra_css = '''
        h1 {font-size: 1.6em; margin-top: 0em}
        .artikel {font-size: 1em; text-transform: uppercase; margin: 0em}
    '''

    def preprocess_html(self, soup):
        # convert local hyperlinks
        for a in soup.findAll('a', href=True):
            if a['href'].startswith('/'):
                a['href'] = 'http://www.dw.com' + a['href']
            elif a['href'].startswith('#'):
                del a['href']
        # remove all style attributes with an effect on font size
        for item in soup.findAll(attrs={'style':re.compile('font-size')}):
            del item['style']
        return soup
Deutsche Welle (spanish)
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function

__license__   = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'

'''
Deutsche Welle (español) - dw.com/es
'''

import re
from calibre.web.feeds.news import BasicNewsRecipe

class DeutscheWelle_es(BasicNewsRecipe):
    title       = 'Deutsche Welle'
    __author__  = 'Darko Miletic'
    description = 'Noticias desde Alemania y mundo'
    publisher   = 'Deutsche Welle'
    language    = 'es'

    oldest_article            = 2
    max_articles_per_feed     = 50
    no_stylesheets            = True
    remove_javascript         = True
    remove_empty_feeds        = True
    ignore_duplicate_articles = {'title', 'url'}

    feeds = [
        ('Titulares', 'http://rss.dw-world.de/rdf/rss-sp-top'),
        ('Noticias de Alemania', 'http://rss.dw-world.de/rdf/rss-sp-ale'),
        ('Internacionales', 'http://rss.dw-world.de/rdf/rss-sp-inter'),
        ('Cultura', 'http://rss.dw-world.de/rdf/rss-sp-cul'),
        ('Ciencia y Tecnología', 'http://rss.dw-world.de/rdf/rss-sp-cyt'),
        ('Economía', 'http://rss.dw-world.de/rdf/rss-sp-eco'),
        ('La prensa opina', 'http://rss.dw-world.de/rdf/rss-sp-press'),
        ('Ecología', 'http://rss.dw-world.de/rdf/rss-sp-ecol'),
        ('Futbol alemán', 'http://rss.dw-world.de/rdf/rss-sp-fut'),
        ('Conozca Alemania', 'http://rss.dw-world.de/rdf/rss-sp-con')
    ]

    keep_only_tags=[
        dict(name='div', attrs={'class':'col3'})
    ]

    remove_tags_after = [
        dict(name='div', attrs={'class':'group'})
    ]

    remove_tags = [
        dict(name='div', attrs={'class':'col1'}),
        dict(name='div', attrs={'class':re.compile('gallery')}),
        dict(name='div', attrs={'class':re.compile('audio')}),
        dict(name='div', attrs={'class':re.compile('video')})
    ]

    remove_attributes = ['height', 'width', 'onclick', 'border', 'lang', 'link']

    extra_css = '''
        h1 {font-size: 1.6em; margin-top: 0em}
        .artikel {font-size: 1em; text-transform: uppercase; margin: 0em}        
    '''

    def preprocess_html(self, soup):
        # convert local hyperlinks
        for a in soup.findAll('a', href=True):
            if a['href'].startswith('/'):
                a['href'] = 'http://www.dw.com' + a['href']
            elif a['href'].startswith('#'):
                del a['href']
        # remove all style attributes with an effect on font size
        for item in soup.findAll(attrs={'style':re.compile('font-size')}):
            del item['style']
        return soup
Aimylios is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
recipe for Deutsche Welle (Deutsch als Fremdsprache xXxXxXxXxXx Recipes 2 09-30-2014 04:47 PM
the onion recipe is not working! earlybookworm Recipes 1 12-24-2013 06:37 AM
Other Fiction May, Karl: Deutsche Herzen, deutsche Helden [german]. v1.1. 20 Oct 2012 dnumiar ePub Books 0 10-16-2012 08:24 AM
New Yorker recipe not working ... cartesio Calibre 11 08-20-2009 01:24 AM
Recipe not working phkoech Calibre 3 08-13-2009 05:41 PM


All times are GMT -4. The time now is 07:05 PM.


MobileRead.com is a privately owned, operated and funded community.