Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-13-2012, 02:33 AM   #1
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
FAZ-Net Update

I made an update for this recipe. There was missing a section called "Rhein-Main".
Attached Files
File Type: zip faznet_AGe.zip (796 Bytes, 280 views)
Divingduck is offline   Reply With Quote
Old 12-19-2013, 08:09 AM   #2
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I made an update for this recipe due to some unless footer add-on's and add one more feed for lifestyle.

Spoiler:
Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2011, Kovid Goyal <kovid at kovidgoyal.net>, Darko Miletic <darko at gmail.com>'
'''
Profile to download FAZ.NET
'''

from calibre.web.feeds.news import BasicNewsRecipe

class FazNet(BasicNewsRecipe):
    title                 = 'FAZ.NET'
    __author__            = 'Kovid Goyal, Darko Miletic'
    description           = 'Frankfurter Allgemeine Zeitung'
    publisher             = 'Frankfurter Allgemeine Zeitung GmbH'
    category              = 'news, politics, Germany'
    use_embedded_content  = False
    language = 'de'

    max_articles_per_feed = 30
    no_stylesheets        = True
    encoding              = 'utf-8'
    remove_javascript     = True

    keep_only_tags = [{'class':'FAZArtikelEinleitung'},
            {'id':'ArtikelTabContent_0'}]

    remove_tags_after = dict(name='div', attrs={'class':['ArtikelFooter']}) # AGe add 2013-12-19
    remove_tags = [dict(name='div', attrs={'class':['ArtikelFooter']})] # AGe add 2013-12-19

                  
    feeds = [
              ('FAZ.NET Aktuell', 'http://www.faz.net/aktuell/?rssview=1'),
              ('Politik', 'http://www.faz.net/aktuell/politik/?rssview=1'),
              ('Wirtschaft', 'http://www.faz.net/aktuell/wirtschaft/?rssview=1'),
              ('Feuilleton', 'http://www.faz.net/aktuell/feuilleton/?rssview=1'),
              ('Sport', 'http://www.faz.net/aktuell/sport/?rssview=1'),
              ('Lebensstil', 'http://www.faz.net/aktuell/lebensstil/?rssview=1'), # AGe add 2013-12-19
              ('Gesellschaft', 'http://www.faz.net/aktuell/gesellschaft/?rssview=1'),
              ('Finanzen', 'http://www.faz.net/aktuell/finanzen/?rssview=1'),
              ('Technik & Motor', 'http://www.faz.net/aktuell/technik-motor/?rssview=1'),
              ('Wissen', 'http://www.faz.net/aktuell/wissen/?rssview=1'),
              ('Reise', 'http://www.faz.net/aktuell/reise/?rssview=1'),
              ('Beruf & Chance', 'http://www.faz.net/aktuell/beruf-chance/?rssview=1'),
              ('Rhein-Main', 'http://www.faz.net/aktuell/rhein-main/?rssview=1')
            ]
Attached Files
File Type: zip faznet_AGe V3.zip (873 Bytes, 258 views)
Divingduck is offline   Reply With Quote
Old 01-09-2014, 03:04 AM   #3
me1969
Junior Member
me1969 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jan 2014
Device: kindle
Hello Divingduck,

faz.net has changed again the format. Longer articles are no longer shown on one page. Now you have to click from page to page: http://www.faz.net/aktuell/feuilleto...-12742526.html

Do you know how to fix this in order to see again the whole article in the calibre download?

Thanks,
Markus
me1969 is offline   Reply With Quote
Old 01-09-2014, 02:56 PM   #4
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I will look on it
Divingduck is offline   Reply With Quote
Old 01-10-2014, 01:35 PM   #5
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Please find attached a new version of the recipe for multipage articles.

Spoiler:
Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2011, Kovid Goyal <kovid at kovidgoyal.net>, Darko Miletic <darko at gmail.com>'
'''
Profile to download FAZ.NET
'''

from calibre.web.feeds.news import BasicNewsRecipe

class FazNet(BasicNewsRecipe):
    title                 = 'FAZ.NET'
    __author__            = 'Kovid Goyal, Darko Miletic, Armin Geller' # AGe upd. V4 2014-01-10
    description           = 'Frankfurter Allgemeine Zeitung'
    publisher             = 'Frankfurter Allgemeine Zeitung GmbH'
    category              = 'news, politics, Germany'
    use_embedded_content  = False
    language = 'de'
    
    max_articles_per_feed = 30
    no_stylesheets        = True
    encoding              = 'utf-8'
    remove_javascript     = True

    keep_only_tags = [{'class':'FAZArtikelEinleitung'},
            {'id':'ArtikelTabContent_0'}]

    remove_tags_after = [dict(name='div', attrs={'class':['ArtikelFooter']})]
    remove_tags = [dict(name='div', attrs={'class':['ArtikelFooter']})]

#    recursions = 1                        # AGe 2014-01-10
#    match_regexps = [r'-p[2-9].html$']    # AGe 2014-01-10
                  
    feeds = [
              ('FAZ.NET Aktuell', 'http://www.faz.net/aktuell/?rssview=1'),
              ('Politik', 'http://www.faz.net/aktuell/politik/?rssview=1'),
              ('Wirtschaft', 'http://www.faz.net/aktuell/wirtschaft/?rssview=1'),
              ('Feuilleton', 'http://www.faz.net/aktuell/feuilleton/?rssview=1'),
              ('Sport', 'http://www.faz.net/aktuell/sport/?rssview=1'),
              ('Lebensstil', 'http://www.faz.net/aktuell/lebensstil/?rssview=1'),
              ('Gesellschaft', 'http://www.faz.net/aktuell/gesellschaft/?rssview=1'),
              ('Finanzen', 'http://www.faz.net/aktuell/finanzen/?rssview=1'),
              ('Technik & Motor', 'http://www.faz.net/aktuell/technik-motor/?rssview=1'),
              ('Wissen', 'http://www.faz.net/aktuell/wissen/?rssview=1'),
              ('Reise', 'http://www.faz.net/aktuell/reise/?rssview=1'),
              ('Beruf & Chance', 'http://www.faz.net/aktuell/beruf-chance/?rssview=1'),
              ('Rhein-Main', 'http://www.faz.net/aktuell/rhein-main/?rssview=1')
            ]

# AGe 2014-01-10 New  for multipages
    INDEX                 = ''
    def append_page(self, soup, appendtag, position):
        pager = soup.find('a',attrs={'title':'Nächste Seite'})
        if pager:
           nexturl = self.INDEX + pager['href']
           soup2 = self.index_to_soup(nexturl)
           texttag = soup2.find('div', attrs={'class':'FAZArtikelContent'})
           texttag.find('div', attrs={'class':'ArtikelFooter'}).extract()
           texttag.find('div', attrs={'class':'ArtikelAbbinder'}).extract()
           texttag.find('div', attrs={'class':'ArtikelKommentieren Artikelfuss GETS;tk;boxen.top-lesermeinungen;tp;content'}).extract()
           texttag.find('div', attrs={'class':'Anzeige GoogleAdsBuehne'}).extract()
           texttag.find('div', attrs={'id':'ArticlePagerBottom'}).extract()           
           newpos = len(texttag.contents)
           self.append_page(soup2,texttag,newpos)
           texttag.extract()
           pager.extract()
           appendtag.insert(position,texttag)

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        pager = soup.find('div',attrs={'id':'ArticlePagerBottom'})
        if pager:
           pager.extract()
        return self.adeify_images(soup)

Let me know, if there are any issues with this version.
Attached Files
File Type: zip faznet_AGe V4.zip (1.3 KB, 235 views)
Divingduck is offline   Reply With Quote
Old 01-11-2014, 09:33 AM   #6
me1969
Junior Member
me1969 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jan 2014
Device: kindle
Thanks a lot, Divingduck!
But it shows now only page 1 and 2 but not page 3 as in this example:
http://www.faz.net/aktuell/wirtschaf...-12746010.html

Last edited by me1969; 01-11-2014 at 11:09 AM.
me1969 is offline   Reply With Quote
Old 01-14-2014, 11:44 AM   #7
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Hi me1969,
sorry for the late answer. I had some test it with different examples up to five pages before it update it. Maybe there are some additional specialties. I will look on your example and come back later.
Divingduck is offline   Reply With Quote
Old 01-14-2014, 02:36 PM   #8
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
A new update for this recipe.

I made a stupid mistake. Now the recipe will work again and I use the first time postprocess (Kovid, thanks for your hint).

Spoiler:
Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2011, Kovid Goyal <kovid at kovidgoyal.net>, Darko Miletic <darko at gmail.com>'
'''
Profile to download FAZ.NET
'''

from calibre.web.feeds.news import BasicNewsRecipe

class FazNet(BasicNewsRecipe):
    title                 = 'FAZ.NET'
    __author__            = 'Kovid Goyal, Darko Miletic, Armin Geller' # AGe upd. V4 2014-01-14
    description           = 'Frankfurter Allgemeine Zeitung'
    publisher             = 'Frankfurter Allgemeine Zeitung GmbH'
    category              = 'news, politics, Germany'
    use_embedded_content  = False
    language = 'de'
    
    max_articles_per_feed = 30
    no_stylesheets        = True
    encoding              = 'utf-8'
    remove_javascript     = True

    keep_only_tags = [{'class':'FAZArtikelEinleitung'},
            {'id':'ArtikelTabContent_0'}]

    remove_tags_after = [dict(name='div', attrs={'class':['ArtikelFooter']})]
    remove_tags = [dict(name='div', attrs={'class':['ArtikelFooter']})]

#    recursions = 1                        # AGe 2014-01-10
#    match_regexps = [r'-p[2-9].html$']    # AGe 2014-01-10
                  
    feeds = [
              ('FAZ.NET Aktuell', 'http://www.faz.net/aktuell/?rssview=1'),
              ('Politik', 'http://www.faz.net/aktuell/politik/?rssview=1'),
              ('Wirtschaft', 'http://www.faz.net/aktuell/wirtschaft/?rssview=1'),
              ('Feuilleton', 'http://www.faz.net/aktuell/feuilleton/?rssview=1'),
              ('Sport', 'http://www.faz.net/aktuell/sport/?rssview=1'),
              ('Lebensstil', 'http://www.faz.net/aktuell/lebensstil/?rssview=1'),
              ('Gesellschaft', 'http://www.faz.net/aktuell/gesellschaft/?rssview=1'),
              ('Finanzen', 'http://www.faz.net/aktuell/finanzen/?rssview=1'),
              ('Technik & Motor', 'http://www.faz.net/aktuell/technik-motor/?rssview=1'),
              ('Wissen', 'http://www.faz.net/aktuell/wissen/?rssview=1'),
              ('Reise', 'http://www.faz.net/aktuell/reise/?rssview=1'),
              ('Beruf & Chance', 'http://www.faz.net/aktuell/beruf-chance/?rssview=1'),
              ('Rhein-Main', 'http://www.faz.net/aktuell/rhein-main/?rssview=1')
            ]

# AGe 2014-01-10 New  for multipages
    INDEX                 = ''
    def append_page(self, soup, appendtag, position):   # AGe upd 2014-01-14
        pager = soup.find('a',attrs={'title':'Nächste Seite'})
        if pager:
           nexturl = self.INDEX + pager['href']
           soup2 = self.index_to_soup(nexturl)
           texttag = soup2.find('div', attrs={'class':'FAZArtikelContent'})
           texttag.find('div', attrs={'class':'ArtikelFooter'}).extract()
           texttag.find('div', attrs={'class':'ArtikelAbbinder'}).extract()
           texttag.find('div', attrs={'class':'ArtikelKommentieren Artikelfuss GETS;tk;boxen.top-lesermeinungen;tp;content'}).extract()
           texttag.find('div', attrs={'class':'Anzeige GoogleAdsBuehne'}).extract()
           newpos = len(texttag.contents)
           self.append_page(soup2,texttag,newpos)
           texttag.extract()
           pager.extract()
           appendtag.insert(position,texttag)

    def preprocess_html(self, soup):                    # AGe upd 2014-01-14
        self.append_page(soup, soup.body, 3)
        return self.adeify_images(soup) 
        
    def postprocess_html(self, soup, first_fetch):      # AGe add 2014-01-14
        for div in soup.findAll(id='ArticlePagerBottom'):
          div.extract()
        return soup


Let me know, if there are any issues with this version.
Attached Files
File Type: zip faznet_AGe V5.zip (1.3 KB, 261 views)
Divingduck is offline   Reply With Quote
Old 01-21-2014, 10:10 AM   #9
me1969
Junior Member
me1969 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jan 2014
Device: kindle
Just seen it, thanks a lot for your help, Divingduck!
me1969 is offline   Reply With Quote
Old 10-24-2014, 04:10 PM   #10
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
A new update

Spoiler:
Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2011, Kovid Goyal <kovid at kovidgoyal.net>, Darko Miletic <darko at gmail.com>'
'''
Profile to download FAZ.NET
'''

from calibre.web.feeds.news import BasicNewsRecipe

class FazNet(BasicNewsRecipe):
    title                 = 'FAZ.NET'
    __author__            = 'Kovid Goyal, Darko Miletic, Armin Geller'  # AGe upd. V6 2014-10-24
    description           = 'Frankfurter Allgemeine Zeitung'
    publisher             = 'Frankfurter Allgemeine Zeitung GmbH'
    category              = 'news, politics, Germany'
    use_embedded_content  = False
    language = 'de'

    max_articles_per_feed = 30
    no_stylesheets        = True
    encoding              = 'utf-8'
    remove_javascript     = True

    keep_only_tags = [{'class':['FAZArtikelEinleitung']},
											dict(name='div', attrs={'class':'FAZSlimHeader'}), 
											{'id':'ArtikelTabContent_0'}
											]

    remove_tags_after = [dict(name='div', attrs={'class':['ArtikelFooter']})]
    remove_tags = [dict(name='div', attrs={'class':['ArtikelFooter','clear']}),
									 dict(name='a', attrs={'title':['Vergrößern']}), #AGe 2014-10-22
									 dict(name='img', attrs={'class':['VideoCtrlIcon']}), #AGe 2014-10-22
									 dict(name='span', attrs={'class':['shareAutor']}) #AGe 2014-10-22
									]

    feeds = [
              ('FAZ.NET Aktuell', 'http://www.faz.net/aktuell/?rssview=1'),
							('Politik', 'http://www.faz.net/aktuell/politik/?rssview=1'),
							('Wirtschaft', 'http://www.faz.net/aktuell/wirtschaft/?rssview=1'),
							('Feuilleton', 'http://www.faz.net/aktuell/feuilleton/?rssview=1'),
							('Sport', 'http://www.faz.net/aktuell/sport/?rssview=1'),
							('Lebensstil', 'http://www.faz.net/aktuell/lebensstil/?rssview=1'),
							('Gesellschaft', 'http://www.faz.net/aktuell/gesellschaft/?rssview=1'),
							('Finanzen', 'http://www.faz.net/aktuell/finanzen/?rssview=1'),
							('Technik & Motor', 'http://www.faz.net/aktuell/technik-motor/?rssview=1'),
							('Wissen', 'http://www.faz.net/aktuell/wissen/?rssview=1'),
							('Reise', 'http://www.faz.net/aktuell/reise/?rssview=1'),
							('Beruf & Chance', 'http://www.faz.net/aktuell/beruf-chance/?rssview=1'),
							('Rhein-Main', 'http://www.faz.net/aktuell/rhein-main/?rssview=1')
            ]

# AGe 2014-01-10 For multipages
    INDEX                 = ''
    def append_page(self, soup, appendtag, position):
        pager = soup.find('a',attrs={'title':'Nächste Seite'})
        if pager:
            nexturl = self.INDEX + pager['href']
            soup2 = self.index_to_soup(nexturl)
            texttag = soup2.find('div', attrs={'class':'FAZArtikelContent'})
            for cls in ('ArtikelFooter', 'ArtikelAbbinder', 'ArtikelKommentieren Artikelfuss GETS;tk;boxen.top-lesermeinungen;tp;content', 'Anzeige GoogleAdsBuehne', 
												'ThemenLinks', 'rechtehinweis', 'stageModule Ressortmodul Rubrikenkopf clearfix', 'VideoCtrlIcon', 'ArtikelAbbinder clearfix',
												'stageModule clearfix GETS;tk;artikel.empfehlungen.weitere-artikel;tp;content'):  #AGe 2014-10-22       
                div = texttag.find(attrs={'class':cls})
                if div is not None:
                    div.extract()
                div = texttag.find(attrs={'title':'Vergrößern'}) #AGe 2014-10-22
                if div is not None:
                    div.extract()
           
            newpos = len(texttag.contents)
            self.append_page(soup2,texttag,newpos)
            texttag.extract()
            pager.extract()
            appendtag.insert(position,texttag)

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        return self.adeify_images(soup)

    def postprocess_html(self, soup, first_fetch):
        for div in soup.findAll(id='ArticlePagerBottom'):
            div.extract()
        for div in soup.findAll('div', attrs={'class':'clear'}):  # AGe add 2014-10-24
            div.extract()
        return soup
Attached Files
File Type: zip faznet_AGe_V6.zip (1.5 KB, 224 views)
Divingduck is offline   Reply With Quote
Old 01-26-2016, 04:59 PM   #11
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
A new update to skip some not wanted advertising.

Spoiler:
__license__ = 'GPL v3'
__copyright__ = '2008-2011, Kovid Goyal <kovid at kovidgoyal.net>, Darko Miletic <darko at gmail.com>'
'''
Profile to download FAZ.NET
'''

from calibre.web.feeds.news import BasicNewsRecipe

class FazNet(BasicNewsRecipe):
title = 'FAZ.NET'
__author__ = 'Kovid Goyal, Darko Miletic, Armin Geller' # AGe upd. V7 2016-01-26
description = 'Frankfurter Allgemeine Zeitung'
publisher = 'Frankfurter Allgemeine Zeitung GmbH'
category = 'news, politics, Germany'
use_embedded_content = False
language = 'de'

max_articles_per_feed = 30
no_stylesheets = True
encoding = 'utf-8'
remove_javascript = True

keep_only_tags = [
{'class':['FAZArtikelEinleitung']},
dict(name='div', attrs={'class':'FAZSlimHeader'}),
{'id':'ArtikelTabContent_0'}
]

remove_tags_after = [dict(name='div', attrs={'class':['ArtikelFooter']})]
remove_tags = [
dict(name='div', attrs={'class':['ArtikelFooter','clear']}),
dict(name='div', attrs={'id':['berndsbox','dertagbox']}), # AGe 2016-01-26
dict(name='a', attrs={'title':['Vergrößern']}), # AGe 2014-10-22
dict(name='img', attrs={'class':['VideoCtrlIcon']}), # AGe 2014-10-22
dict(name='span', attrs={'class':['shareAutor']}) # AGe 2014-10-22
]

feeds = [
('FAZ.NET Aktuell', 'http://www.faz.net/aktuell/?rssview=1'),
('Politik', 'http://www.faz.net/aktuell/politik/?rssview=1'),
('Wirtschaft', 'http://www.faz.net/aktuell/wirtschaft/?rssview=1'),
('Feuilleton', 'http://www.faz.net/aktuell/feuilleton/?rssview=1'),
('Sport', 'http://www.faz.net/aktuell/sport/?rssview=1'),
('Lebensstil', 'http://www.faz.net/aktuell/lebensstil/?rssview=1'),
('Gesellschaft', 'http://www.faz.net/aktuell/gesellschaft/?rssview=1'),
('Finanzen', 'http://www.faz.net/aktuell/finanzen/?rssview=1'),
('Technik & Motor', 'http://www.faz.net/aktuell/technik-motor/?rssview=1'),
('Wissen', 'http://www.faz.net/aktuell/wissen/?rssview=1'),
('Reise', 'http://www.faz.net/aktuell/reise/?rssview=1'),
('Beruf & Chance', 'http://www.faz.net/aktuell/beruf-chance/?rssview=1'),
('Rhein-Main', 'http://www.faz.net/aktuell/rhein-main/?rssview=1')
]

# AGe 2014-01-10 For multipages
INDEX = ''
def append_page(self, soup, appendtag, position):
pager = soup.find('a',attrs={'title':'Nächste Seite'})
if pager:
nexturl = self.INDEX + pager['href']
soup2 = self.index_to_soup(nexturl)
texttag = soup2.find('div', attrs={'class':'FAZArtikelContent'})
for cls in (
'ArtikelFooter', 'ArtikelAbbinder',
'ArtikelKommentieren Artikelfuss GETS;tk;boxen.top-lesermeinungen;tp;content',
'Anzeige GoogleAdsBuehne', 'ThemenLinks', 'rechtehinweis',
'stageModule Ressortmodul Rubrikenkopf clearfix', 'VideoCtrlIcon',
'ArtikelAbbinder clearfix',
'stageModule clearfix GETS;tk;artikel.empfehlungen.weitere-artikel;tp;content',
'ThemenLinks',
): # AGe 2014-10-22
div = texttag.find(attrs={'class':cls})
if div is not None:
div.extract()
for cls in (
'berndsbox','dertagbox'): # AGe 2016-01-26
div = texttag.find(attrs={'id':cls})
if div is not None:
div.extract()
div = texttag.find(attrs={'title':'Vergrößern'}) # AGe 2014-10-22
if div is not None:
div.extract()
newpos = len(texttag.contents)
self.append_page(soup2,texttag,newpos)
texttag.extract()
pager.extract()
appendtag.insert(position,texttag)

def preprocess_html(self, soup):
self.append_page(soup, soup.body, 3)
for img in soup.findAll('img', attrs={'data-src':True}):
img['src'] = img['data-src']
return self.adeify_images(soup)

def postprocess_html(self, soup, first_fetch):
for div in soup.findAll(id='ArticlePagerBottom'):
div.extract()
for div in soup.findAll('div', attrs={'class':'clear'}): # AGe add 2014-10-24
div.extract()
return soup


As always, let me know if there are any issues with this version.

DivingDuck
Attached Files
File Type: zip faznet_AGe_V7.zip (1.6 KB, 206 views)
Divingduck is offline   Reply With Quote
Old 09-01-2017, 03:55 PM   #12
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
FAZ.Net have change the layout. Find attached the new update.

Spoiler:
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
__license__   = 'GPL v3'
__copyright__ = '2008-2011, Kovid Goyal <kovid at kovidgoyal.net>, Darko Miletic <darko at gmail.com>'

class FazNet(BasicNewsRecipe):
    # Version 8.0
    # Update 2017-09-01
    # Armin Geller
    # new web page layout

    title                 = 'FAZ.NET'
    __author__            = 'Kovid Goyal, Darko Miletic, Armin Geller'
    description           = 'Frankfurter Allgemeine Zeitung'
    publisher             = 'Frankfurter Allgemeine Zeitung GmbH'
    category              = 'news, politics, Germany'
    use_embedded_content  = False
    language = 'de'

    max_articles_per_feed = 30
    no_stylesheets        = True
    encoding              = 'utf-8'
    remove_javascript     = True

    keep_only_tags = [dict(name='article', attrs={'class':'atc'})]

    remove_tags_after = [dict(name='article', attrs={'class':['atc']})]
    
    remove_tags = [
                    dict(name='aside', attrs={'class':['atc-ContainerMore ',
                                                       'atc-ContainerMore atc-ContainerMoreOneTeaser sld-TeaserMoreOneTeaser  js-slider-teaser-more'
                                                       ]}),
                    dict(name='div', attrs={'class':['atc-ContainerSocialMedia',
                                                     'atc-ContainerFunctions_Interaction ',
                                                     'ctn-PlaceholderContent ctn-PlaceholderContent-is-in-article-medium '
                                                     ]})
                  ]
    
    feeds = [
                ('FAZ.NET Aktuell', 'http://www.faz.net/aktuell/?rssview=1'),
                ('Politik', 'http://www.faz.net/aktuell/politik/?rssview=1'),
                ('Wirtschaft', 'http://www.faz.net/aktuell/wirtschaft/?rssview=1'),
                ('Feuilleton', 'http://www.faz.net/aktuell/feuilleton/?rssview=1'),
                ('Sport', 'http://www.faz.net/aktuell/sport/?rssview=1'),
                ('Lebensstil', 'http://www.faz.net/aktuell/lebensstil/?rssview=1'),
                ('Gesellschaft', 'http://www.faz.net/aktuell/gesellschaft/?rssview=1'),
                ('Finanzen', 'http://www.faz.net/aktuell/finanzen/?rssview=1'),
                ('Technik & Motor', 'http://www.faz.net/aktuell/technik-motor/?rssview=1'),
                ('Wissen', 'http://www.faz.net/aktuell/wissen/?rssview=1'),
                ('Reise', 'http://www.faz.net/aktuell/reise/?rssview=1'),
                ('Beruf & Chance', 'http://www.faz.net/aktuell/beruf-chance/?rssview=1'),
                ('Rhein-Main', 'http://www.faz.net/aktuell/rhein-main/?rssview=1')
            ]

    # For multipages:

    INDEX = ''
        
    def append_page(self, soup, appendtag, position):
        pager = soup.find('li',attrs={'class':'nvg-Paginator_Item nvg-Paginator_Item-to-next-page'})
        if pager:
            nexturl = self.INDEX + pager.a['href']
            soup2 = self.index_to_soup(nexturl)
            texttag = soup2.find('article', attrs={'class':'atc'})
            for cls in (
                    'atc-Header',
                    'ctn-PlaceholderContent ctn-PlaceholderContent-is-in-article-medium ',
                    'ctn-PlaceholderContent ctn-PlaceholderContent-is-in-article-medium ctn-PlaceholderContent-has-centered-content ',
                    'atc-ContainerMore '
                    ):
                div = texttag.find(attrs={'class':cls})
                if div is not None:
                    div.extract()
            newpos = len(texttag.contents)
            self.append_page(soup2,texttag,newpos)
            texttag.extract()
            pager.extract()
            appendtag.insert(position,texttag)

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        for img in soup.findAll('img', attrs={'data-src':True}):
            img['src'] = img['data-src']
        return self.adeify_images(soup)

    # Some last cleanup
    def postprocess_html(self, soup, first_fetch):
        for div in soup.findAll('div',attrs={'class':['atc-ContainerFunctions_Navigation','atc-ContainerFunctions_Interaction ']}):
            div.extract()
        return soup

As always, please let me know if there are any issues with this version.

Best regards,
DD
Attached Files
File Type: zip faznet_AGe_V8.0.zip (1.5 KB, 191 views)
Divingduck is offline   Reply With Quote
Old 09-08-2017, 05:15 AM   #13
Mainframe-Junki
Junior Member
Mainframe-Junki began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2007
Location: Germany
Device: iPad
Thanks for the update
Mainframe-Junki is offline   Reply With Quote
Old 09-08-2017, 12:12 PM   #14
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
You are welcome.
Divingduck is offline   Reply With Quote
Old 05-29-2022, 11:26 AM   #15
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
FAZ.Net update

Pls. find attached a new update.


Spoiler:
Code:
# vim:fileencoding=utf-8
# from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
__license__   = 'GPL v3'
__copyright__ = '2008-2011, Kovid Goyal <kovid at kovidgoyal.net>, Darko Miletic <darko at gmail.com>'

class FazNet(BasicNewsRecipe):
    # Version 9.1
    # Update 2022-05-29
    # Armin Geller
    # new page layout

    title                 = 'FAZ.NET'
    __author__            = 'Kovid Goyal, Darko Miletic, Armin Geller'
    description           = 'Frankfurter Allgemeine Zeitung'
    publisher             = 'Frankfurter Allgemeine Zeitung GmbH'
    category              = 'news, politics, Germany'

    encoding              = 'utf-8'
    language              = 'de'

    max_articles_per_feed = 30
    no_stylesheets        = True
    remove_javascript     = True


    extra_css      =  '''
                      .atc-headlineemphasis, h1, h2 {font-size:1.6em; text-align:left}
                      .atc-HeadlineEmphasisText {font-size:0.6em; text-align:left; display:block; text-transform:uppercase;}
                      .atc-IntroText {font-size:1em; font-style:italic; font-weight:bold;margin-bottom:1em}
                      h3 {font-size:1.3em;text-align:left}
                      h4, h5, h6 {font-size:1em;text-align:left}
                      .textbox-wide {font-size:1.3em; font-style:italic}
                      .atc-ImageDescriptionText, .atc-ImageDescriptionCopyright {font-size: 0.75em; font-style:italic; font-weight:normal}
                      .atc-MetaItem {font-size:0.6em; font-weight:normal; margin-bottom:0.75em; text-align:left; list-style-type:none; text-transform:uppercase; display:inline-block}
                      .aut-Teaser_Avatar {font-size:0.6em; font-weight:bold; margin-bottom:0.75em; text-align:left}
                      .aut-Teaser_Name {font-size:0.6em; font-weight:bold; margin-bottom:0.75em; float:left; text-align:left}
                      .aut-Teaser_Description {font-size:0.6em; font-weight: normal; margin-bottom:0.75em; text-align:left; display:block}
                      .atc-Footer{font-size:0.6em; font-weight: normal; margin-bottom:0.75em; display:block}
                      '''                      
    
    keep_only_tags = [dict(name='article', attrs={'class':'atc'}),
                      dict(name='div', attrs={'id':'FAZContent'})
                     ]

    remove_tags_after = [dict(name='article', attrs={'class':'atc'})]
    
    remove_tags = [
                   dict(name='div', attrs={'class':['atc-ContainerSocialMedia',
                                                    'atc-ContainerFunctions_Interaction ',                   
                                                    'ctn-PlaceholderContent ctn-PlaceholderContent-is-in-article-medium',
                                                    'ctn-PlaceholderContent ctn-PlaceholderContent-is-in-article-medium ctn-PlaceholderContent-has-centered-content',
                                                    'ctn-PlaceholderBox ctn-PlaceholderBox-is-in-article-text-right',
                                                    'ctn-PlaceholderContent ctn-PlaceholderContent-is-in-article-text-left ctn-PlaceholderContent-is-in-article-small',
                                                    'aut-Follow aut-Follow-is-small-teaser',
                                                    'aut-Follow aut-Follow-is-teaser',
                                                    'js-ctn-PaywallTeasers ctn-PaywallTeasers',                                                    
                                                    'ctn-PaywallInfo_TeaserImageContainer',
                                                    'ctn-PaywallInfo_OfferContainer'
                                                    ]}),
                   dict(name='aside', attrs={'class':['atc-ContainerMore',
                                                     'atc-ContainerMoreOneTeaser'
                                                    ]}),
                   dict(name='span', attrs={'class':['data-button',
                                                     'o-VisuallyHidden'
                                                    ]}),
                   dict(name='a', attrs={'class':'btn-Base_Link'})
                  ]
    
    feeds = [
             ('FAZ.NET Aktuell', 'http://www.faz.net/aktuell/?rssview=1'),
             ('Politik', 'http://www.faz.net/aktuell/politik/?rssview=1'),
             ('Wirtschaft', 'http://www.faz.net/aktuell/wirtschaft/?rssview=1'),
             ('Feuilleton', 'http://www.faz.net/aktuell/feuilleton/?rssview=1'),
             ('Sport', 'http://www.faz.net/aktuell/sport/?rssview=1'),
             ('Lebensstil', 'http://www.faz.net/aktuell/lebensstil/?rssview=1'),
             ('Gesellschaft', 'http://www.faz.net/aktuell/gesellschaft/?rssview=1'),
             ('Finanzen', 'http://www.faz.net/aktuell/finanzen/?rssview=1'),
             ('Technik & Motor', 'http://www.faz.net/aktuell/technik-motor/?rssview=1'),
             ('Wissen', 'http://www.faz.net/aktuell/wissen/?rssview=1'),
             ('Reise', 'http://www.faz.net/aktuell/reise/?rssview=1'),
             ('Beruf & Chance', 'http://www.faz.net/aktuell/beruf-chance/?rssview=1'),
             ('Rhein-Main', 'http://www.faz.net/aktuell/rhein-main/?rssview=1')
            ]

    # For multipages:

    INDEX = ''
        
    def append_page(self, soup, appendtag, position):
        pager = soup.find('li',attrs={'class':'nvg-Paginator_Item nvg-Paginator_Item-to-next-page'})
        if pager:
            nexturl = self.INDEX + pager.a['href']
            soup2 = self.index_to_soup(nexturl)
            texttag = soup2.find('article', attrs={'class':'atc'})
            for cls in (
                    'atc-Header',
                    'atc-ContainerMore',
                    'atc-ContainerFunctions_Interaction',
                    'aut-Follow aut-Follow-is-small-teaser',
                    'aut-Follow aut-Follow-is-teaser'
                    ):
                div = texttag.find(attrs={'class':cls})
                if div is not None:
                    div.extract()
            newpos = len(texttag.contents)
            self.append_page(soup2,texttag,newpos)
            texttag.extract()
            pager.extract()
            appendtag.insert(position,texttag)
    
    # Find images

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        for img in soup.findAll('img', attrs={'data-retina-src':True}):
            img['src'] = img['data-retina-src']
        for img in soup.findAll('img', attrs={'data-src':True}):
            img['src'] = img['data-src']
        return self.adeify_images(soup)

 
    # Some last cleanup
    
    def postprocess_html(self, soup, first_fetch):
        for div in soup.findAll('div',attrs={'class':['atc-ContainerFunctions js-som-Abbinder',
                                                      'ctn-PlaceholderContent ctn-PlaceholderContent-is-in-article-medium'
                                                     ]}):
            div.extract()
        return soup
Attached Files
File Type: zip faznet_AGe_V9.1.zip (2.0 KB, 62 views)
Divingduck is offline   Reply With Quote
Reply

Tags
faz-net recipe

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
FAZ.NET recipe fails due to website redesign juco Recipes 7 10-07-2011 11:53 AM
FAZ.NET: Website-Redesign macht das calibre-Rezept wertlos juco Software 1 10-05-2011 02:42 AM
recipe for FAZ.net - german schuster Recipes 10 05-28-2011 12:13 AM
Request: Inquirer.net Recipe update zoilom Recipes 0 12-21-2010 01:06 AM


All times are GMT -4. The time now is 06:19 AM.


MobileRead.com is a privately owned, operated and funded community.