Updated Telepolis (News+Artikel) Recipe

syntaxis · 11-21-2010, 03:44 PM

Hi There,

I've updated the Telepolis recipe:
Changes:
*Now has correct Pagebreak on Kindle / Mobi Format
*Fetches Articles and News
*Not showing comments below articles anymore

Code:

# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__copyright__ = '2009, Gerhard Aigner <gerhard.aigner at gmail.com>'


import re
from calibre.web.feeds.news import BasicNewsRecipe

class TelepolisNews(BasicNewsRecipe):
    title          = u'Telepolis (News+Artikel)'
    __author__ = 'Gerhard Aigner'
    publisher = 'Heise Zeitschriften Verlag GmbH & Co KG'
    description = 'News from telepolis'
    category = 'news'
    oldest_article = 7
    max_articles_per_feed = 100
    recursion = 0
    no_stylesheets = True
    encoding = "utf-8"
    language = 'de_AT'

    use_embedded_content =False
    remove_empty_feeds = True

    preprocess_regexps = [(re.compile(r'<a[^>]*>', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'</a>', re.DOTALL|re.IGNORECASE), lambda match: ''),]

    keep_only_tags = [dict(name = 'td',attrs={'class':'bloghead'}),dict(name = 'td',attrs={'class':'blogfliess'})]
    remove_tags = [dict(name='img'), dict(name='td',attrs={'class':'blogbottom'}), dict(name='td',attrs={'class':'forum'})]

    feeds          = [(u'News', u'http://www.heise.de/tp/news-atom.xml')]

    html2lrf_options = [
        '--comment'  , description
        , '--category' , category
        , '--publisher', publisher
    ]

    html2epub_options  = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'

    def get_article_url(self, article):
        '''if the linked article is of kind artikel don't take it'''
        if (article.link.count('artikel') > 1) :
            return None
        return article.link

    def preprocess_html(self, soup):
        mtag = '<meta http-equiv="Content-Type" content="text/html; charset=' + self.encoding + '">'
        soup.head.insert(0,mtag)
        return soup

patdej · 01-12-2011, 03:41 AM

Hi; I've got an SONY 650 Touch Reader. Unfortunately the device reboots when I access the content of my Telepolis epub download. Do you have any idea? Thanks Pat

DoctorOhh · 01-12-2011, 04:34 AM

Quote:

Originally Posted by patdej

Hi; I've got an SONY 650 Touch Reader. Unfortunately the device reboots when I access the content of my Telepolis epub download. Do you have any idea? Thanks Pat

This means something in the html code is conflicting with your reader. You can report a problem with the recipe as a bug report.

Converting the epub to Mobi then back to epub might make the book viewable. Opening the epub in Sigil then saving it as a epub from within Sigil might remove incompatible code.

kovidgoyal · 01-12-2011, 10:46 AM

No point opening a bug report as I do not provide support for recipes that I haven't written.

juco · 05-04-2011, 11:47 AM

Hi, the Telepolis website was relaunched recently (with major layout changes), causing the current recipe to fail. Is anybody willing, able and determined to update the recipe? BTW, Telepolis is a German and not an Austrian publication: It should be listed under »Deutsch« and not under »German (AT)« since it might be overlooked there.

Thanks,
Ralph

theducks · 05-04-2011, 12:12 PM

Moderator Notice
Moved to Recipes

syntaxis · 05-09-2011, 07:03 AM

Hi,

I've rewritten the recipe (pics are also included now)

Code:

# -*- coding: utf-8 -*-

import re
from calibre.web.feeds.news import BasicNewsRecipe

class TelepolisNews(BasicNewsRecipe):
    title          = u'Telepolis (News+Artikel)'
    __author__ = 'syntaxis'
    publisher = 'Heise Zeitschriften Verlag GmbH & Co KG'
    description = 'News from Telepolis'
    category = 'news'
    oldest_article = 1
    max_articles_per_feed = 100
    recursion = 0
    no_stylesheets = True
    encoding = "utf-8"
    language = 'de'

    
    remove_empty_feeds = True

    

    keep_only_tags = [dict(name = 'div',attrs={'class':'head'}),dict(name = 'div',attrs={'class':'leftbox'}),dict(name='td',attrs={'class':'strict'})]
    remove_tags = [ dict(name='td',attrs={'class':'blogbottom'}), 
	        dict(name='div',attrs={'class':'forum'}), dict(name='div',attrs={'class':'social'}),dict(name='div',attrs={'class':'blog-letter p-news'}),
	        dict(name='div',attrs={'class':'blog-sub'}),dict(name='div',attrs={'class':'version-div'}),dict(name='div',attrs={'id':'breadcrumb'})
	        ,dict(attrs={'class':'tp-url'}),dict(attrs={'class':'blog-name entry_'}) ]

    remove_tags_after  = [dict(name='span', attrs={'class':['breadcrumb']})]


    feeds          = [(u'News', u'http://www.heise.de/tp/news-atom.xml')]

    html2lrf_options = [
        '--comment'  , description
        , '--category' , category
        , '--publisher', publisher
    ]

    html2epub_options  = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'


    def preprocess_html(self, soup):
        mtag = '<meta http-equiv="Content-Type" content="text/html; charset=' + self.encoding + '">'
        soup.head.insert(0,mtag)
        return soup

juco · 05-09-2011, 10:34 AM

Hi syntaxis,

thank you for your quick reaction! Your updated receipe works fine, however I have three issuses to report:

1) Despite the parameter max_articles_per_feed = 100 (which I reduced to 30), your script only fetches 11 articles, and I have no clue why...

2) Quite a few articles, but not all, (try "Endlich Schluss mit Hartz IV") start with a single lower case character (here: "w") in the first line. This is only a minor aesthetical issue, of course.

3) It would be nice if chapter headlines (if present) would be formated as headlines (i.e. larger and bolder as the regular text). calibre manages to do that on its own if you have no receipe at hand and use the default mode instead...

Apart from that, everything works fine: all unnecessary stuff is trimmed off as it should be. Great! The only really annoying behaviour is that the script fetches fewer articles than it is supposed to do. Perhaps you can find a way to fix that? Thanks!

Best wishes,
Ralph

syntaxis · 05-15-2011, 06:40 AM

Hi Juco,

if you want to see more arcticles you have to change that line

Code:

 oldest_article = 1

1 means it only fetches articles that are not older than 1 day.

2 +3 I made some changes, should work now

Code:

# -*- coding: utf-8 -*-

import re
from calibre.web.feeds.news import BasicNewsRecipe

class TelepolisNews(BasicNewsRecipe):
    title          = u'Telepolis (News+Artikel)'
    __author__ = 'syntaxis'
    publisher = 'Heise Zeitschriften Verlag GmbH & Co KG'
    description = 'News from Telepolis'
    category = 'news'
    oldest_article = 1
    max_articles_per_feed = 100
    recursion = 0
    no_stylesheets =True
    encoding = "utf-8"
    language = 'de'

    
    remove_empty_feeds = True

    

    keep_only_tags = [dict(name = 'div',attrs={'class':'head'}),dict(name = 'div',attrs={'class':'leftbox'}),dict(name='td',attrs={'class':'strict'})]
    remove_tags = [ dict(name='td',attrs={'class':'blogbottom'}), 
	        dict(name='div',attrs={'class':'forum'}), dict(name='div',attrs={'class':'social'}),dict(name='div',attrs={'class':'blog-letter p-news'}),
	        dict(name='div',attrs={'class':'blog-sub'}),dict(name='div',attrs={'class':'version-div'}),dict(name='div',attrs={'id':'breadcrumb'})
	        ,dict(attrs={'class':'tp-url'}),dict(name= 'div', attrs={'class':['blog-letter e-news','blog-letter m-news','blog-letter w-news','blog-letter t-news',
		'blog-letter k-news','blog-letter s-news']}) ]

    remove_tags_after  = [dict(name='span', attrs={'class':['breadcrumb']})]


    feeds          = [(u'News', u'http://www.heise.de/tp/news-atom.xml')]

    html2lrf_options = [
        '--comment'  , description
        , '--category' , category
        , '--publisher', publisher
    ]

    html2epub_options  = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'


    def preprocess_html(self, soup):
        mtag = '<meta http-equiv="Content-Type" content="text/html; charset=' + self.encoding + '">'
        soup.head.insert(0,mtag)
        return soup

extra_css = '''
                h1 {color:#008852;font-family:Arial,Helvetica,sans-serif; font-size:25px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:22px; }
                h2 {color:#4D4D4D;font-family:Arial,Helvetica,sans-serif; font-size:18px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:16px; }
                h3 {color:#4D4D4D;font-family:Arial,Helvetica,sans-serif; font-size:15px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:14px;}
                h4 {color:#333333; font-family:Arial,Helvetica,sans-serif;font-size:12px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:14px; }
                h5 {color:#333333; font-family:Arial,Helvetica,sans-serif; font-size:11px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:14px; text-transform:uppercase;}
                '''

05-04-2011, 11:47 AM	#5
juco aka zonebattler Posts: 32 Karma: 50 Join Date: Oct 2003 Location: Fürth, Germany Device: Kindle KB, Kindle PW Signature Edition (11. Gen)	Hi, the Telepolis website was relaunched recently (with major layout changes), causing the current recipe to fail. Is anybody willing, able and determined to update the recipe? BTW, Telepolis is a German and not an Austrian publication: It should be listed under »Deutsch« and not under »German (AT)« since it might be overlooked there. Thanks, Ralph Last edited by juco; 05-04-2011 at 11:49 AM.

05-09-2011, 10:34 AM	#8
juco aka zonebattler Posts: 32 Karma: 50 Join Date: Oct 2003 Location: Fürth, Germany Device: Kindle KB, Kindle PW Signature Edition (11. Gen)	Hi syntaxis, thank you for your quick reaction! Your updated receipe works fine, however I have three issuses to report: 1) Despite the parameter max_articles_per_feed = 100 (which I reduced to 30), your script only fetches 11 articles, and I have no clue why... 2) Quite a few articles, but not all, (try "Endlich Schluss mit Hartz IV") start with a single lower case character (here: "w") in the first line. This is only a minor aesthetical issue, of course. 3) It would be nice if chapter headlines (if present) would be formated as headlines (i.e. larger and bolder as the regular text). calibre manages to do that on its own if you have no receipe at hand and use the default mode instead... Apart from that, everything works fine: all unnecessary stuff is trimmed off as it should be. Great! The only really annoying behaviour is that the script fetches fewer articles than it is supposed to do. Perhaps you can find a way to fix that? Thanks! Best wishes, Ralph Last edited by juco; 05-09-2011 at 10:37 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Updated New York Times recipe	nickredding	Recipes	2	11-20-2010 10:53 AM
[Updated recipe] Ming Pao (明報) - Hong Kong	tylau0	Recipes	0	11-12-2010 06:24 PM
[Updated recipe] Ming Pao (明報) - Hong Kong	tylau0	Recipes	0	11-06-2010 06:46 PM
Updated New Yorker recipe doesn't fetch comics	yekim54	Recipes	2	10-09-2010 10:47 PM
Calibre Recipe: Telepolis (Artikel) (German)	lena_punkt	Calibre	1	09-27-2010 05:03 AM

01-12-2011, 03:41 AM	#2
patdej Junior Member Posts: 1 Karma: 10 Join Date: Jan 2011 Device: SONY TOUCH 650	Hi; I've got an SONY 650 Touch Reader. Unfortunately the device reboots when I access the content of my Telepolis epub download. Do you have any idea? Thanks Pat

01-12-2011, 10:46 AM	#4
kovidgoyal creator of calibre Posts: 45,596 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various	No point opening a bug report as I do not provide support for recipes that I haven't written.

05-04-2011, 12:12 PM	#6
theducks Well trained by Cats Posts: 31,240 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	Moderator Notice Moved to Recipes

Advert

Advert