Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-21-2010, 03:44 PM   #1
syntaxis
Junior Member
syntaxis began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: Kindle Paperwhite (2014)
Updated Telepolis (News+Artikel) Recipe

Hi There,

I've updated the Telepolis recipe:
Changes:
*Now has correct Pagebreak on Kindle / Mobi Format
*Fetches Articles and News
*Not showing comments below articles anymore

Code:
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__copyright__ = '2009, Gerhard Aigner <gerhard.aigner at gmail.com>'


import re
from calibre.web.feeds.news import BasicNewsRecipe

class TelepolisNews(BasicNewsRecipe):
    title          = u'Telepolis (News+Artikel)'
    __author__ = 'Gerhard Aigner'
    publisher = 'Heise Zeitschriften Verlag GmbH & Co KG'
    description = 'News from telepolis'
    category = 'news'
    oldest_article = 7
    max_articles_per_feed = 100
    recursion = 0
    no_stylesheets = True
    encoding = "utf-8"
    language = 'de_AT'

    use_embedded_content =False
    remove_empty_feeds = True

    preprocess_regexps = [(re.compile(r'<a[^>]*>', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'</a>', re.DOTALL|re.IGNORECASE), lambda match: ''),]

    keep_only_tags = [dict(name = 'td',attrs={'class':'bloghead'}),dict(name = 'td',attrs={'class':'blogfliess'})]
    remove_tags = [dict(name='img'), dict(name='td',attrs={'class':'blogbottom'}), dict(name='td',attrs={'class':'forum'})]

    feeds          = [(u'News', u'http://www.heise.de/tp/news-atom.xml')]

    html2lrf_options = [
        '--comment'  , description
        , '--category' , category
        , '--publisher', publisher
    ]

    html2epub_options  = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'

    def get_article_url(self, article):
        '''if the linked article is of kind artikel don't take it'''
        if (article.link.count('artikel') > 1) :
            return None
        return article.link

    def preprocess_html(self, soup):
        mtag = '<meta http-equiv="Content-Type" content="text/html; charset=' + self.encoding + '">'
        soup.head.insert(0,mtag)
        return soup
syntaxis is offline   Reply With Quote
Old 01-12-2011, 03:41 AM   #2
patdej
Junior Member
patdej began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2011
Device: SONY TOUCH 650
Hi; I've got an SONY 650 Touch Reader. Unfortunately the device reboots when I access the content of my Telepolis epub download. Do you have any idea? Thanks Pat
patdej is offline   Reply With Quote
Old 01-12-2011, 04:34 AM   #3
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by patdej View Post
Hi; I've got an SONY 650 Touch Reader. Unfortunately the device reboots when I access the content of my Telepolis epub download. Do you have any idea? Thanks Pat
This means something in the html code is conflicting with your reader. You can report a problem with the recipe as a bug report.

Converting the epub to Mobi then back to epub might make the book viewable. Opening the epub in Sigil then saving it as a epub from within Sigil might remove incompatible code.
DoctorOhh is offline   Reply With Quote
Old 01-12-2011, 10:46 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,597
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No point opening a bug report as I do not provide support for recipes that I haven't written.
kovidgoyal is offline   Reply With Quote
Old 05-04-2011, 11:47 AM   #5
juco
aka zonebattler
juco is on a distinguished road
 
juco's Avatar
 
Posts: 32
Karma: 50
Join Date: Oct 2003
Location: Fürth, Germany
Device: Kindle KB, Kindle PW Signature Edition (11. Gen)
Hi, the Telepolis website was relaunched recently (with major layout changes), causing the current recipe to fail. Is anybody willing, able and determined to update the recipe? BTW, Telepolis is a German and not an Austrian publication: It should be listed under »Deutsch« and not under »German (AT)« since it might be overlooked there.

Thanks,
Ralph

Last edited by juco; 05-04-2011 at 11:49 AM.
juco is offline   Reply With Quote
Old 05-04-2011, 12:12 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,240
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Moderator Notice
Moved to Recipes
theducks is offline   Reply With Quote
Old 05-09-2011, 07:03 AM   #7
syntaxis
Junior Member
syntaxis began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: Kindle Paperwhite (2014)
Hi,

I've rewritten the recipe (pics are also included now)

Code:
# -*- coding: utf-8 -*-

import re
from calibre.web.feeds.news import BasicNewsRecipe

class TelepolisNews(BasicNewsRecipe):
    title          = u'Telepolis (News+Artikel)'
    __author__ = 'syntaxis'
    publisher = 'Heise Zeitschriften Verlag GmbH & Co KG'
    description = 'News from Telepolis'
    category = 'news'
    oldest_article = 1
    max_articles_per_feed = 100
    recursion = 0
    no_stylesheets = True
    encoding = "utf-8"
    language = 'de'

    
    remove_empty_feeds = True

    

    keep_only_tags = [dict(name = 'div',attrs={'class':'head'}),dict(name = 'div',attrs={'class':'leftbox'}),dict(name='td',attrs={'class':'strict'})]
    remove_tags = [ dict(name='td',attrs={'class':'blogbottom'}), 
	        dict(name='div',attrs={'class':'forum'}), dict(name='div',attrs={'class':'social'}),dict(name='div',attrs={'class':'blog-letter p-news'}),
	        dict(name='div',attrs={'class':'blog-sub'}),dict(name='div',attrs={'class':'version-div'}),dict(name='div',attrs={'id':'breadcrumb'})
	        ,dict(attrs={'class':'tp-url'}),dict(attrs={'class':'blog-name entry_'}) ]

    remove_tags_after  = [dict(name='span', attrs={'class':['breadcrumb']})]


    feeds          = [(u'News', u'http://www.heise.de/tp/news-atom.xml')]

    html2lrf_options = [
        '--comment'  , description
        , '--category' , category
        , '--publisher', publisher
    ]

    html2epub_options  = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'


    def preprocess_html(self, soup):
        mtag = '<meta http-equiv="Content-Type" content="text/html; charset=' + self.encoding + '">'
        soup.head.insert(0,mtag)
        return soup
syntaxis is offline   Reply With Quote
Old 05-09-2011, 10:34 AM   #8
juco
aka zonebattler
juco is on a distinguished road
 
juco's Avatar
 
Posts: 32
Karma: 50
Join Date: Oct 2003
Location: Fürth, Germany
Device: Kindle KB, Kindle PW Signature Edition (11. Gen)
Hi syntaxis,

thank you for your quick reaction! Your updated receipe works fine, however I have three issuses to report:

1) Despite the parameter max_articles_per_feed = 100 (which I reduced to 30), your script only fetches 11 articles, and I have no clue why...

2) Quite a few articles, but not all, (try "Endlich Schluss mit Hartz IV") start with a single lower case character (here: "w") in the first line. This is only a minor aesthetical issue, of course.

3) It would be nice if chapter headlines (if present) would be formated as headlines (i.e. larger and bolder as the regular text). calibre manages to do that on its own if you have no receipe at hand and use the default mode instead...

Apart from that, everything works fine: all unnecessary stuff is trimmed off as it should be. Great! The only really annoying behaviour is that the script fetches fewer articles than it is supposed to do. Perhaps you can find a way to fix that? Thanks!

Best wishes,
Ralph

Last edited by juco; 05-09-2011 at 10:37 AM.
juco is offline   Reply With Quote
Old 05-15-2011, 06:40 AM   #9
syntaxis
Junior Member
syntaxis began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: Kindle Paperwhite (2014)
Hi Juco,

if you want to see more arcticles you have to change that line
Code:
 oldest_article = 1
1 means it only fetches articles that are not older than 1 day.

2 +3 I made some changes, should work now

Code:
# -*- coding: utf-8 -*-

import re
from calibre.web.feeds.news import BasicNewsRecipe

class TelepolisNews(BasicNewsRecipe):
    title          = u'Telepolis (News+Artikel)'
    __author__ = 'syntaxis'
    publisher = 'Heise Zeitschriften Verlag GmbH & Co KG'
    description = 'News from Telepolis'
    category = 'news'
    oldest_article = 1
    max_articles_per_feed = 100
    recursion = 0
    no_stylesheets =True
    encoding = "utf-8"
    language = 'de'

    
    remove_empty_feeds = True

    

    keep_only_tags = [dict(name = 'div',attrs={'class':'head'}),dict(name = 'div',attrs={'class':'leftbox'}),dict(name='td',attrs={'class':'strict'})]
    remove_tags = [ dict(name='td',attrs={'class':'blogbottom'}), 
	        dict(name='div',attrs={'class':'forum'}), dict(name='div',attrs={'class':'social'}),dict(name='div',attrs={'class':'blog-letter p-news'}),
	        dict(name='div',attrs={'class':'blog-sub'}),dict(name='div',attrs={'class':'version-div'}),dict(name='div',attrs={'id':'breadcrumb'})
	        ,dict(attrs={'class':'tp-url'}),dict(name= 'div', attrs={'class':['blog-letter e-news','blog-letter m-news','blog-letter w-news','blog-letter t-news',
		'blog-letter k-news','blog-letter s-news']}) ]

    remove_tags_after  = [dict(name='span', attrs={'class':['breadcrumb']})]


    feeds          = [(u'News', u'http://www.heise.de/tp/news-atom.xml')]

    html2lrf_options = [
        '--comment'  , description
        , '--category' , category
        , '--publisher', publisher
    ]

    html2epub_options  = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'


    def preprocess_html(self, soup):
        mtag = '<meta http-equiv="Content-Type" content="text/html; charset=' + self.encoding + '">'
        soup.head.insert(0,mtag)
        return soup

extra_css = '''
                h1 {color:#008852;font-family:Arial,Helvetica,sans-serif; font-size:25px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:22px; }
                h2 {color:#4D4D4D;font-family:Arial,Helvetica,sans-serif; font-size:18px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:16px; }
                h3 {color:#4D4D4D;font-family:Arial,Helvetica,sans-serif; font-size:15px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:14px;}
                h4 {color:#333333; font-family:Arial,Helvetica,sans-serif;font-size:12px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:14px; }
                h5 {color:#333333; font-family:Arial,Helvetica,sans-serif; font-size:11px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; line-height:14px; text-transform:uppercase;}
                '''

Last edited by syntaxis; 05-16-2011 at 08:16 AM.
syntaxis is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Updated New York Times recipe nickredding Recipes 2 11-20-2010 10:53 AM
[Updated recipe] Ming Pao (明報) - Hong Kong tylau0 Recipes 0 11-12-2010 06:24 PM
[Updated recipe] Ming Pao (明報) - Hong Kong tylau0 Recipes 0 11-06-2010 06:46 PM
Updated New Yorker recipe doesn't fetch comics yekim54 Recipes 2 10-09-2010 10:47 PM
Calibre Recipe: Telepolis (Artikel) (German) lena_punkt Calibre 1 09-27-2010 05:03 AM


All times are GMT -4. The time now is 08:17 AM.


MobileRead.com is a privately owned, operated and funded community.