05-16-2019, 02:12 AM | #16 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
|
05-16-2019, 07:06 AM | #17 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Code:
encoding = 'windows-1252' |
Advert | |
|
05-16-2019, 12:00 PM | #18 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Sorry, but the problem as such tickles me.
Could it be that there is something wrong with the preprocess_regexps? I tried to simply replace A by B, using: Code:
preprocess_regexps = [ (re.compile(r'A', re.DOTALL), lambda match: 'B'), |
05-16-2019, 09:30 PM | #19 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
make sure you have the right indentation for preprocess_regexps it should be at the same level as the title for example.
|
05-17-2019, 02:12 AM | #20 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
It looks like this:
Code:
class AdvancedUserRecipe1295262156(BasicNewsRecipe): title = u'kath.net' __author__ = 'Bobus' description = u'Katholische Nachrichten' oldest_article = 7 language = 'de' max_articles_per_feed = 100 no_stylesheets = True encoding = 'iso-8859-1' preprocess_regexps = [ (re.compile(r'A', re.DOTALL), lambda match: 'B'), Edit': The missing square bracket is, in fact, there (fault at copy/paste). Last edited by Leonatus; 05-17-2019 at 09:42 AM. |
Advert | |
|
05-17-2019, 10:30 AM | #21 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
post the actual recipe file as a zipped up attachment, too much chance of things chaning with copy paste
|
05-17-2019, 12:17 PM | #22 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Here we are!
|
05-18-2019, 05:43 AM | #23 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The folowing recipe works for me with quotes preserved:
Code:
from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1295262156(BasicNewsRecipe): title = u'kath.net' __author__ = 'Bobus' description = u'Katholische Nachrichten' oldest_article = 7 language = 'de' max_articles_per_feed = 100 no_stylesheets = True encoding = 'cp1252' feeds = [(u'kath.net', u'https://www.kath.net/2005/xml/index.xml')] def print_version(self, url): return url + "/print/yes" def get_browser(self, *a, **kwargs): kwargs['verify_ssl_certificates'] = False return BasicNewsRecipe.get_browser(self, *a, **kwargs) extra_css = 'td.textb {font-size: medium;}' |
05-18-2019, 07:45 AM | #24 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Nope, no success here. Same appearance as always.
|
05-20-2019, 05:30 AM | #25 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Perhaps is it important to know that I reiceive the news via Joel Goguen's "KoboTouch-extended"-plugin in the Kobo-epub format (kepub), as usual with basically all my books.
|
05-20-2019, 10:26 AM | #26 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
look at the downloaded epub file using the calibre viewer first.
|
05-20-2019, 10:39 AM | #27 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Yes, that's what I do, and that's why my last post is superfluous, because above I did already tell that the appearance in Calibre's ebook viewer is the same as on my ereader. I apologize!
|
05-26-2019, 06:29 AM | #28 | |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Quote:
Are you sure? There are various authors writing in various styles, some of them using the "classical" keyboard quotes - which are, indeed, preserved. I tried with three computers, and it's always the same result, even taking in account your proposal: The more "typographical" quotes are transformed into the replacement character, might they be single or double. I played a lot, but found no resoöution, whereas, downloading another newspaper with similar structure (i. e. using "classical" and "typographic" quotes, everything is allright, the encoding beeing ISO-8859-1. The recipe of it is: Code:
#!/usr/bin/env python2 # vim:fileencoding=utf-8 # License: GPLv3 Copyright: 2016, Kovid Goyal <kovid at kovidgoyal.net> from __future__ import (unicode_literals, division, absolute_import, print_function) from calibre.web.feeds.recipes import BasicNewsRecipe def classes(classes): q = frozenset(classes.split(' ')) return dict(attrs={'class': lambda x: x and frozenset(x.split()).intersection(q)}) class BerlinerZeitung(BasicNewsRecipe): title = 'Berliner Zeitung' __author__ = 'Kovid Goyal' language = 'de' description = 'Berliner Zeitung RSS' timefmt = ' [%d.%m.%Y]' ignore_duplicate_articles = {'title', 'url'} remove_empty_feeds = True # oldest_article = 7.0 no_stylesheets = True remove_javascript = True use_embedded_content = False publication_type = 'newspaper' keep_only_tags = [ classes('dm_article_body dm_article_header'), ] remove_tags = [ classes('dm_article_share'), ] feeds = [x.split() for x in [ 'Berlin http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699382-asYahooFeed.xml', 'Brandenburg http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699570-asYahooFeed.xml', 'Politik http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699614-asYahooFeed.xml', 'Wirtschaft http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699644-asYahooFeed.xml', 'Sport http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699874-asYahooFeed.xml', 'Kultur http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700020-asYahooFeed.xml', 'Panorama http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700178-asYahooFeed.xml', 'Wissen http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700222-asYahooFeed.xml', 'Digital http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700594-asYahooFeed.xml', 'Ratgeber http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700190-asYahooFeed.xml', ]] |
|
06-06-2019, 04:23 AM | #29 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Eventually, I came to the conclusion that the problem cannot be resolved, as the characters in question are already replaced on the rss-page of the journal. There appear - strange enough - empty quadrats instead of the quotes, for example. Why this happens, and why there are no problems when the rss-feed is subscribed on a computer, this will remain a secret of the deepest depths of the internet, at least for my simple mind.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Should I go for a replacement? | n33raj18 | Amazon Kindle | 14 | 08-28-2014 07:18 AM |
Replacement Character Frustration | amo48 | Sigil | 4 | 05-18-2012 12:43 PM |
Touch Replacement Plan | PeterT | Kobo Reader | 3 | 06-18-2011 08:09 PM |
regex for character replacement, em-dash questions | cybmole | Calibre | 3 | 10-18-2010 03:09 PM |
PRS-600 So, should I ask for a replacement? | ziegl027 | Sony Reader | 8 | 01-25-2010 10:40 AM |