View Single Post
Old 05-26-2019, 06:29 AM   #28
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,060
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Quote:
Originally Posted by kovidgoyal View Post
The folowing recipe works for me with quotes preserved:
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe


class AdvancedUserRecipe1295262156(BasicNewsRecipe):
    title = u'kath.net'
    __author__ = 'Bobus'
    description = u'Katholische Nachrichten'
    oldest_article = 7
    language = 'de'
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'cp1252'

    feeds = [(u'kath.net', u'https://www.kath.net/2005/xml/index.xml')]

    def print_version(self, url):
        return url + "/print/yes"

    def get_browser(self, *a, **kwargs):
        kwargs['verify_ssl_certificates'] = False
        return BasicNewsRecipe.get_browser(self, *a, **kwargs)

    extra_css = 'td.textb {font-size: medium;}'

Are you sure? There are various authors writing in various styles, some of them using the "classical" keyboard quotes - which are, indeed, preserved. I tried with three computers, and it's always the same result, even taking in account your proposal: The more "typographical" quotes are transformed into the replacement character, might they be single or double.
I played a lot, but found no resoöution, whereas, downloading another newspaper with similar structure (i. e. using "classical" and "typographic" quotes, everything is allright, the encoding beeing ISO-8859-1. The recipe of it is:
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2016, Kovid Goyal <kovid at kovidgoyal.net>

from __future__ import (unicode_literals, division, absolute_import,
                        print_function)
from calibre.web.feeds.recipes import BasicNewsRecipe


def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={'class': lambda x: x and frozenset(x.split()).intersection(q)})


class BerlinerZeitung(BasicNewsRecipe):
    title = 'Berliner Zeitung'
    __author__ = 'Kovid Goyal'
    language = 'de'
    description = 'Berliner Zeitung RSS'
    timefmt = ' [%d.%m.%Y]'
    ignore_duplicate_articles = {'title', 'url'}
    remove_empty_feeds = True

    # oldest_article = 7.0
    no_stylesheets = True
    remove_javascript = True
    use_embedded_content = False
    publication_type = 'newspaper'

    keep_only_tags = [
        classes('dm_article_body dm_article_header'),
    ]
    remove_tags = [
        classes('dm_article_share'),
    ]

    feeds = [x.split() for x in [
        'Berlin http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699382-asYahooFeed.xml',
        'Brandenburg http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699570-asYahooFeed.xml',
        'Politik http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699614-asYahooFeed.xml',
        'Wirtschaft http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699644-asYahooFeed.xml',
        'Sport http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699874-asYahooFeed.xml',
        'Kultur http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700020-asYahooFeed.xml',
        'Panorama http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700178-asYahooFeed.xml',
        'Wissen http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700222-asYahooFeed.xml',
        'Digital http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700594-asYahooFeed.xml',
        'Ratgeber http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700190-asYahooFeed.xml',
    ]]
Leonatus is offline   Reply With Quote