01-09-2014, 07:01 AM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: Jan 2014
Device: kindle
|
FAZ.net now with multipages
Hello,
faz.net has changed its format and shows now articles with multipages. I have found this post for a solution: https://www.mobileread.com/forums/sho...1&postcount=17 But unfortunately I am not familiar with coding. Can anybody help? Here's the faz.net recipe: __license__ = 'GPL v3' __copyright__ = '2008-2011, Kovid Goyal <kovid at kovidgoyal.net>, Darko Miletic <darko at gmail.com>' ''' Profile to download FAZ.NET ''' from calibre.web.feeds.news import BasicNewsRecipe class FazNet(BasicNewsRecipe): title = 'FAZ.NET' __author__ = 'Kovid Goyal, Darko Miletic' description = 'Frankfurter Allgemeine Zeitung' publisher = 'Frankfurter Allgemeine Zeitung GmbH' category = 'news, politics, Germany' use_embedded_content = False language = 'de' max_articles_per_feed = 30 no_stylesheets = True encoding = 'utf-8' remove_javascript = True keep_only_tags = [{'class':'FAZArtikelEinleitung'}, {'id':'ArtikelTabContent_0'}] remove_tags_after = dict(name='div', attrs={'class':['ArtikelFooter']}) # AGe add 2013-12-19 remove_tags = [dict(name='div', attrs={'class':['ArtikelFooter']})] # AGe add 2013-12-19 feeds = [ ('FAZ.NET Aktuell', 'http://www.faz.net/aktuell/?rssview=1'), ('Politik', 'http://www.faz.net/aktuell/politik/?rssview=1'), ('Wirtschaft', 'http://www.faz.net/aktuell/wirtschaft/?rssview=1'), ('Feuilleton', 'http://www.faz.net/aktuell/feuilleton/?rssview=1'), ('Sport', 'http://www.faz.net/aktuell/sport/?rssview=1'), ('Lebensstil', 'http://www.faz.net/aktuell/lebensstil/?rssview=1'), # AGe add 2013-12-19 ('Gesellschaft', 'http://www.faz.net/aktuell/gesellschaft/?rssview=1'), ('Finanzen', 'http://www.faz.net/aktuell/finanzen/?rssview=1'), ('Technik & Motor', 'http://www.faz.net/aktuell/technik-motor/?rssview=1'), ('Wissen', 'http://www.faz.net/aktuell/wissen/?rssview=1'), ('Reise', 'http://www.faz.net/aktuell/reise/?rssview=1'), ('Beruf & Chance', 'http://www.faz.net/aktuell/beruf-chance/?rssview=1'), ('Rhein-Main', 'http://www.faz.net/aktuell/rhein-main/?rssview=1') ] Thanks! Markus |
01-09-2014, 08:01 AM | #2 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
Advert | |
|
01-09-2014, 08:41 AM | #3 |
Junior Member
Posts: 5
Karma: 10
Join Date: Jan 2014
Device: kindle
|
Thanks a lot, Kovid!
Donation is coming, too ;-) |
01-09-2014, 03:53 PM | #4 |
Wizard
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Wow Kovid, this is an impressive two liner. Is this new?
I made just a test in the recipe. If there is a multipage, the jump from page one to page two works perfect with the internal page link. But on page two and following pages, the links refers to the original internet pages instead of the internal pages. Picture one show the links for page one, picture 2 shows the links on top of page two and picture 2 shows the bottom links for page two. Same issue for next pages. Is there a solution to fix this? Edit: I used this article for the test: Die NSA und der Quantencomputer Last edited by Divingduck; 01-09-2014 at 03:58 PM. |
01-09-2014, 09:56 PM | #5 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No, it has always been present. I didn't bother to remove the next page sections, you should do that via postprocess_html, something like this
def postprocess_html(self, soup, first_fetch): for div in soup.findAll(id='next_page_whatever'): div.extract() return soup |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
FAZ-Net Update | Divingduck | Recipes | 14 | 05-29-2022 11:26 AM |
creating a recipe for faz.net's e-paper | MayJune | Recipes | 8 | 04-15-2016 05:26 AM |
FAZ.NET recipe fails due to website redesign | juco | Recipes | 7 | 10-07-2011 11:53 AM |
FAZ.NET: Website-Redesign macht das calibre-Rezept wertlos | juco | Software | 1 | 10-05-2011 02:42 AM |
recipe for FAZ.net - german | schuster | Recipes | 10 | 05-28-2011 12:13 AM |