View Single Post
Old 03-10-2023, 05:21 AM   #1
Sushi5675
Junior Member
Sushi5675 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Mar 2023
Device: kindle paperwhite
My SZ Recipe does not fetch all articles

Hi there,

I am very frustrated right now and really hope someone can help me out here.

I am trying to fetch news from www.sueddeutsche.de which works fine for some articles but for others, it does not. The paragraph is somehow hidden in the html code and doesnt get extracted.

I have a subscription so the articles should be visible even though they are behind a paywall. But the fetching process doesnt work only on some articles regardless whether they are behind a paywall or not.

This article is for example not working:

view-source:https://www.sueddeutsche.de/politik/...215?print=true

I use the print=true tag because it is much cleaner then...

I am really looking forward to any idea or code example.
If we can figure that out here I'd be happy to share the recipe via calibre because the last recipes I found are quite old...

Thank you!!

PHP Code:
# -*- coding: utf-8 -*-
__license__ 'GPL v3'

#import
from calibre.web.feeds.news import BasicNewsRecipe
from calibre
.ebooks.BeautifulSoup import BeautifulSoup
from calibre import strftime

##SZ
class Sueddeutsche(BasicNewsRecipe):
    
title u'SZ8'
    
description 'News from Germany'
    
publisher u'Süddeutsche Zeitung'
    
category 'news, politics'
    
timefmt ' [%a, %d %b %Y]'
    
oldest_article 1
    max_articles_per_feed 
10
    language 
'de'
    
encoding 'utf-8'
    
publication_type 'newspaper'
    
remove_empty_feeds True
    needs_subscription 
True
    use_embedded_content 
False
    no_stylesheets 
True
    remove_javascript 
False
    auto_cleanup 
True
    
#simultaneous_downloads = 1
    #articles_are_obfuscated = True

    
    #add login

    
def get_browser(self):
        
browser BasicNewsRecipe.get_browser(self)
        
# Login
        
url 'https://id.sueddeutsche.de/login'
        
browser.open(url)
        
browser.select_form(nr=0)  # first form
        
browser['login'] = self.username
        browser
['password'] = self.password
        browser
.submit()
        return 
browser

    feeds 
= [  
        (
u'Politik'u'http://rss.sueddeutsche.de/rss/Politik'),
    ]
    
    
    
def print_version(selfurl):
        return 
url '?print=true' 
Sushi5675 is offline   Reply With Quote