05-14-2011, 12:46 PM | #1 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
recipe for FAZ.net - german
Code:
import string, re from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup class AdvancedUserRecipe1303841067(BasicNewsRecipe): title = u'Faz.net' __author__ = 'schuster' remove_tags = [dict(attrs={'class':['right', 'ArrowLinkRight', 'ModulVerlagsInfo', 'left', 'Head']}), dict(id=['BreadCrumbs', 'tstag', 'FazFooterPrint']), dict(name=['script', 'noscript', 'style'])] oldest_article = 2 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False language = 'de' remove_javascript = True cover_url = 'http://www.faz.net/f30/Images/Logos/logo.gif' def print_version(self, url): return url.replace('.html', '~Afor~Eprint.html') feeds = [(u'Politik', u'http://www.faz.net/s/RubA24ECD630CAE40E483841DB7D16F4211/Tpl~Epartner~SRss_.xml'), (u'Wirtschaft', u'http://www.faz.net/s/RubC9401175958F4DE28E143E68888825F6/Tpl~Epartner~SRss_.xml'), (u'Feuilleton', u'http://www.faz.net/s/RubCC21B04EE95145B3AC877C874FB1B611/Tpl~Epartner~SRss_.xml'), (u'Sport', u'http://www.faz.net/s/Rub9F27A221597D4C39A82856B0FE79F051/Tpl~Epartner~SRss_.xml'), (u'Gesellschaft', u'http://www.faz.net/s/Rub02DBAA63F9EB43CEB421272A670A685C/Tpl~Epartner~SRss_.xml'), (u'Finanzen', u'http://www.faz.net/s/Rub4B891837ECD14082816D9E088A2D7CB4/Tpl~Epartner~SRss_.xml'), (u'Wissen', u'http://www.faz.net/s/Rub7F4BEE0E0C39429A8565089709B70C44/Tpl~Epartner~SRss_.xml'), (u'Reise', u'http://www.faz.net/s/RubE2FB5CA667054BDEA70FB3BC45F8D91C/Tpl~Epartner~SRss_.xml'), (u'Technik & Motor', u'http://www.faz.net/s/Rub01E4D53776494844A85FDF23F5707AD8/Tpl~Epartner~SRss_.xml'), (u'Beruf & Chance', u'http://www.faz.net/s/RubB1E10A8367E8446897468EDAA6EA0504/Tpl~Epartner~SRss_.xml'), (u'Kunstmarkt', u'http://www.faz.net/s/RubBC09F7BF72A2405A96718ECBFB68FBFE/Tpl~Epartner~SRss_.xml'), (u'Immobilien ', u'http://www.faz.net/s/RubFED172A9E10F46B3A5F01B02098C0C8D/Tpl~Epartner~SRss_.xml'), (u'Rhein-Main Zeitung', u'http://www.faz.net/s/RubABE881A6669742C2A5EBCB5D50D7EBEE/Tpl~Epartner~SRss_.xml'), (u'Atomdebatte ', u'http://www.faz.net/s/Rub469C43057F8C437CACC2DE9ED41B7950/Tpl~Epartner~SRss_.xml') ] this one is in detail (categorys) built like the print version of the newspaper Last edited by schuster; 05-14-2011 at 03:38 PM. Reason: Declaration |
05-15-2011, 10:31 AM | #2 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There is already a recipe for faz.net is yours different?
|
Advert | |
|
05-15-2011, 11:26 AM | #3 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
hi kovid,
yes it is different, becaus it has the full range of articels that can be received from faz.net and the categorization from the real-print-version. greetings |
05-16-2011, 06:54 AM | #4 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
|
05-16-2011, 10:59 AM | #5 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No worries, I already updated it, I just forgot to post here.
|
Advert | |
|
05-26-2011, 01:25 PM | #6 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
*Update*
Code:
from calibre.web.feeds.recipes import BasicNewsRecipe class AdvancedUserRecipe1303841067(BasicNewsRecipe): title = u'Faz.net' __author__ = 'schuster' oldest_article = 1 description = 'Frankfurter Allgemeine Zeitung' max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False language = 'de' remove_javascript = True cover_url = 'http://www.faz.net/f30/Images/Logos/logo.gif' remove_tags = [dict(attrs={'class':['LinkBoxModulSmall', 'ModulLesermeinungenFooter', 'ModulArtikelServices', 'SocialMediaUnten', 'ArrowLinkRight', 'ModulVerlagsInfo', 'AdData', 'FazFooter', 'Date']}), dict(id=['FAZNavHeader', 'FAZNavMain', 'RightColumn', 'FazFooter', 'BreadCrumbs', 'FAZNavSubMain', 'FAZImgEvent']), dict(name=['jksrdt'])] feeds = [(u'Politik', u'http://www.faz.net/s/RubA24ECD630CAE40E483841DB7D16F4211/Tpl~Epartner~SRss_.xml'), (u'Wirtschaft', u'http://www.faz.net/s/RubC9401175958F4DE28E143E68888825F6/Tpl~Epartner~SRss_.xml'), (u'Feuilleton', u'http://www.faz.net/s/RubCC21B04EE95145B3AC877C874FB1B611/Tpl~Epartner~SRss_.xml'), (u'Sport', u'http://www.faz.net/s/Rub9F27A221597D4C39A82856B0FE79F051/Tpl~Epartner~SRss_.xml'), (u'Gesellschaft', u'http://www.faz.net/s/Rub02DBAA63F9EB43CEB421272A670A685C/Tpl~Epartner~SRss_.xml'), (u'Finanzen', u'http://www.faz.net/s/Rub4B891837ECD14082816D9E088A2D7CB4/Tpl~Epartner~SRss_.xml'), (u'Wissen', u'http://www.faz.net/s/Rub7F4BEE0E0C39429A8565089709B70C44/Tpl~Epartner~SRss_.xml'), (u'Reise', u'http://www.faz.net/s/RubE2FB5CA667054BDEA70FB3BC45F8D91C/Tpl~Epartner~SRss_.xml'), (u'Technik & Motor', u'http://www.faz.net/s/Rub01E4D53776494844A85FDF23F5707AD8/Tpl~Epartner~SRss_.xml'), (u'Beruf & Chance', u'http://www.faz.net/s/RubB1E10A8367E8446897468EDAA6EA0504/Tpl~Epartner~SRss_.xml'), (u'Kunstmarkt', u'http://www.faz.net/s/RubBC09F7BF72A2405A96718ECBFB68FBFE/Tpl~Epartner~SRss_.xml'), (u'Immobilien ', u'http://www.faz.net/s/RubFED172A9E10F46B3A5F01B02098C0C8D/Tpl~Epartner~SRss_.xml'), (u'Rhein-Main Zeitung', u'http://www.faz.net/s/RubABE881A6669742C2A5EBCB5D50D7EBEE/Tpl~Epartner~SRss_.xml'), (u'Atomdebatte ', u'http://www.faz.net/s/Rub469C43057F8C437CACC2DE9ED41B7950/Tpl~Epartner~SRss_.xml') ] |
05-26-2011, 01:46 PM | #7 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
updated
|
05-27-2011, 05:57 AM | #8 |
Enthusiast
Posts: 28
Karma: 50
Join Date: Oct 2003
Location: Bavaria/Germany
Device: Palm m105, Kindle KB
|
Until the day before yesterday, I used the "old" receipe that came with calibre, which I had enhanced by adding the other category feeds (in pretty much the same way user schuster did in his script):
Code:
__license__ = 'GPL v3' __copyright__ = '2008-2011, Kovid Goyal <kovid at kovidgoyal.net>, Darko Miletic <darko at gmail.com>, Ralph Stenzel <ralph at klein-aber-fein.de>' ''' Profile to download FAZ.NET ''' from calibre.web.feeds.news import BasicNewsRecipe class FazNet(BasicNewsRecipe): title = 'FAZ.NET' __author__ = 'Kovid Goyal, Darko Miletic, Ralph Stenzel' description = 'Frankfurter Allgemeine Zeitung' publisher = 'Frankfurter Allgemeine Zeitung GmbH' category = 'news, politics, Germany' use_embedded_content = False language = 'de' max_articles_per_feed = 30 no_stylesheets = True encoding = 'utf-8' remove_javascript = True html2lrf_options = [ '--comment', description , '--category', category , '--publisher', publisher ] html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"' keep_only_tags = [dict(name='div', attrs={'class':'Article'})] remove_tags = [ dict(name=['object','link','embed','base']) ,dict(name='div', attrs={'class':['LinkBoxModulSmall','ModulVerlagsInfo']}) ] feeds = [ ('FAZ.NET Aktuell', 'http://www.faz.net/s/RubF3CE08B362D244869BE7984590CB6AC1/Tpl~Epartner~SRss_.xml'), ('Politik', 'http://www.faz.net/s/RubA24ECD630CAE40E483841DB7D16F4211/Tpl~Epartner~SRss_.xml'), ('Wirtschaft', 'http://www.faz.net/s/RubC9401175958F4DE28E143E68888825F6/Tpl~Epartner~SRss_.xml'), ('Feuilleton', 'http://www.faz.net/s/RubCC21B04EE95145B3AC877C874FB1B611/Tpl~Epartner~SRss_.xml'), ('Sport', 'http://www.faz.net/s/Rub9F27A221597D4C39A82856B0FE79F051/Tpl~Epartner~SRss_.xml'), ('Gesellschaft', 'http://www.faz.net/s/Rub02DBAA63F9EB43CEB421272A670A685C/Tpl~Epartner~SRss_.xml'), ('Finanzen', 'http://www.faz.net/s/Rub4B891837ECD14082816D9E088A2D7CB4/Tpl~Epartner~SRss_.xml'), ('Wissen', 'http://www.faz.net/s/Rub7F4BEE0E0C39429A8565089709B70C44/Tpl~Epartner~SRss_.xml'), ('Reise', 'http://www.faz.net/s/RubE2FB5CA667054BDEA70FB3BC45F8D91C/Tpl~Epartner~SRss_.xml'), ('Technik & Motor', 'http://www.faz.net/s/Rub01E4D53776494844A85FDF23F5707AD8/Tpl~Epartner~SRss_.xml'), ('Beruf & Chance', 'http://www.faz.net/s/RubB1E10A8367E8446897468EDAA6EA0504/Tpl~Epartner~SRss_.xml') ] def print_version(self, url): article, sep, rest = url.partition('?') return article.replace('.html', '~Afor~Eprint.html') def preprocess_html(self, soup): mtag = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>' soup.head.insert(0,mtag) del soup.body['onload'] for item in soup.findAll(style=True): del item['style'] return soup The new receipe from user schuster *does* deliver content, however it does not provide the same "polished" and well-looking results (cropping, formatting etc.) which the previous script did. Perhaps someone here is able to fix the receipe cited here in my comment so that it may be put in service again? Any help would be greatly appreciated! Thanks in advance, Ralph Last edited by juco; 05-27-2011 at 08:18 AM. |
05-27-2011, 10:47 AM | #9 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
hi juco,
proofed on kindle3 and looks like the previous one. could you pn me what the different in detail. greetings |
05-27-2011, 02:25 PM | #10 |
Enthusiast
Posts: 28
Karma: 50
Join Date: Oct 2003
Location: Bavaria/Germany
Device: Palm m105, Kindle KB
|
Hi schuster,
thanks for getting in touch with me! Will send you a PN soon in our mother's tongue... ;-) Yours, Ralph |
05-28-2011, 12:13 AM | #11 |
Enthusiast
Posts: 28
Karma: 50
Join Date: Oct 2003
Location: Bavaria/Germany
Device: Palm m105, Kindle KB
|
The updated FAZ.NET receipe that comes bundled with calibre v0.8.3 works perfectly again. Thank you very much indeeed, Kovid! :-)
Yours, Ralph |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
LWN.net Weekly News recipe | davide125 | Recipes | 22 | 11-12-2014 09:44 PM |
Request: Inquirer.net Recipe update | zoilom | Recipes | 0 | 12-21-2010 01:06 AM |
A recipe for "Siol.net" | BlonG | Recipes | 1 | 11-08-2010 11:15 AM |
FAZ: Deutschland vs. Google | KernelPanic | Lounge | 3 | 12-04-2009 06:21 AM |