Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-16-2019, 02:12 AM   #16
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Quote:
Originally Posted by lui1 View Post
So any webpage which claims to be encoded with ISO-8859-1 should be treated as being encoded with Windows-1252.

Code:
encoding = 'windows-1252'
That's interesting! Is it also valid for transactions as downloading news in Calibre
and transferring to an e-book-reader?
Leonatus is offline   Reply With Quote
Old 05-16-2019, 07:06 AM   #17
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Code:
encoding = 'windows-1252'
Alas! Didn't work either. So, it seems impossible to get rid of this replacement character?
Leonatus is offline   Reply With Quote
Advert
Old 05-16-2019, 12:00 PM   #18
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Sorry, but the problem as such tickles me.
Could it be that there is something wrong with the preprocess_regexps?
I tried to simply replace A by B, using:
Code:
preprocess_regexps = [
   (re.compile(r'A', re.DOTALL),
    lambda match: 'B'),
and it didn't work.
Leonatus is offline   Reply With Quote
Old 05-16-2019, 09:30 PM   #19
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
make sure you have the right indentation for preprocess_regexps it should be at the same level as the title for example.
kovidgoyal is offline   Reply With Quote
Old 05-17-2019, 02:12 AM   #20
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
It looks like this:
Code:
class AdvancedUserRecipe1295262156(BasicNewsRecipe):
    title = u'kath.net'
    __author__ = 'Bobus'
    description = u'Katholische Nachrichten'
    oldest_article = 7
    language = 'de'
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'iso-8859-1'
    preprocess_regexps = [
         (re.compile(r'A', re.DOTALL),
          lambda match: 'B'),
Edit: No, by inserting into the CODE the last two lines went two spaces forward.
Edit': The missing square bracket is, in fact, there (fault at copy/paste).

Last edited by Leonatus; 05-17-2019 at 09:42 AM.
Leonatus is offline   Reply With Quote
Advert
Old 05-17-2019, 10:30 AM   #21
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
post the actual recipe file as a zipped up attachment, too much chance of things chaning with copy paste
kovidgoyal is offline   Reply With Quote
Old 05-17-2019, 12:17 PM   #22
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Here we are!
Attached Files
File Type: zip Recipe.zip (639 Bytes, 134 views)
Leonatus is offline   Reply With Quote
Old 05-18-2019, 05:43 AM   #23
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The folowing recipe works for me with quotes preserved:

Code:
from calibre.web.feeds.news import BasicNewsRecipe


class AdvancedUserRecipe1295262156(BasicNewsRecipe):
    title = u'kath.net'
    __author__ = 'Bobus'
    description = u'Katholische Nachrichten'
    oldest_article = 7
    language = 'de'
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'cp1252'

    feeds = [(u'kath.net', u'https://www.kath.net/2005/xml/index.xml')]

    def print_version(self, url):
        return url + "/print/yes"

    def get_browser(self, *a, **kwargs):
        kwargs['verify_ssl_certificates'] = False
        return BasicNewsRecipe.get_browser(self, *a, **kwargs)

    extra_css = 'td.textb {font-size: medium;}'
kovidgoyal is offline   Reply With Quote
Old 05-18-2019, 07:45 AM   #24
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Nope, no success here. Same appearance as always.
Leonatus is offline   Reply With Quote
Old 05-20-2019, 05:30 AM   #25
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Perhaps is it important to know that I reiceive the news via Joel Goguen's "KoboTouch-extended"-plugin in the Kobo-epub format (kepub), as usual with basically all my books.
Leonatus is offline   Reply With Quote
Old 05-20-2019, 10:26 AM   #26
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
look at the downloaded epub file using the calibre viewer first.
kovidgoyal is offline   Reply With Quote
Old 05-20-2019, 10:39 AM   #27
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Yes, that's what I do, and that's why my last post is superfluous, because above I did already tell that the appearance in Calibre's ebook viewer is the same as on my ereader. I apologize!
Leonatus is offline   Reply With Quote
Old 05-26-2019, 06:29 AM   #28
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Quote:
Originally Posted by kovidgoyal View Post
The folowing recipe works for me with quotes preserved:
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe


class AdvancedUserRecipe1295262156(BasicNewsRecipe):
    title = u'kath.net'
    __author__ = 'Bobus'
    description = u'Katholische Nachrichten'
    oldest_article = 7
    language = 'de'
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'cp1252'

    feeds = [(u'kath.net', u'https://www.kath.net/2005/xml/index.xml')]

    def print_version(self, url):
        return url + "/print/yes"

    def get_browser(self, *a, **kwargs):
        kwargs['verify_ssl_certificates'] = False
        return BasicNewsRecipe.get_browser(self, *a, **kwargs)

    extra_css = 'td.textb {font-size: medium;}'

Are you sure? There are various authors writing in various styles, some of them using the "classical" keyboard quotes - which are, indeed, preserved. I tried with three computers, and it's always the same result, even taking in account your proposal: The more "typographical" quotes are transformed into the replacement character, might they be single or double.
I played a lot, but found no resoöution, whereas, downloading another newspaper with similar structure (i. e. using "classical" and "typographic" quotes, everything is allright, the encoding beeing ISO-8859-1. The recipe of it is:
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2016, Kovid Goyal <kovid at kovidgoyal.net>

from __future__ import (unicode_literals, division, absolute_import,
                        print_function)
from calibre.web.feeds.recipes import BasicNewsRecipe


def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={'class': lambda x: x and frozenset(x.split()).intersection(q)})


class BerlinerZeitung(BasicNewsRecipe):
    title = 'Berliner Zeitung'
    __author__ = 'Kovid Goyal'
    language = 'de'
    description = 'Berliner Zeitung RSS'
    timefmt = ' [%d.%m.%Y]'
    ignore_duplicate_articles = {'title', 'url'}
    remove_empty_feeds = True

    # oldest_article = 7.0
    no_stylesheets = True
    remove_javascript = True
    use_embedded_content = False
    publication_type = 'newspaper'

    keep_only_tags = [
        classes('dm_article_body dm_article_header'),
    ]
    remove_tags = [
        classes('dm_article_share'),
    ]

    feeds = [x.split() for x in [
        'Berlin http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699382-asYahooFeed.xml',
        'Brandenburg http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699570-asYahooFeed.xml',
        'Politik http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699614-asYahooFeed.xml',
        'Wirtschaft http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699644-asYahooFeed.xml',
        'Sport http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23699874-asYahooFeed.xml',
        'Kultur http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700020-asYahooFeed.xml',
        'Panorama http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700178-asYahooFeed.xml',
        'Wissen http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700222-asYahooFeed.xml',
        'Digital http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700594-asYahooFeed.xml',
        'Ratgeber http://www.berliner-zeitung.de/blueprint/servlet/xml/berliner-zeitung/23700190-asYahooFeed.xml',
    ]]
Leonatus is offline   Reply With Quote
Old 06-06-2019, 04:23 AM   #29
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Eventually, I came to the conclusion that the problem cannot be resolved, as the characters in question are already replaced on the rss-page of the journal. There appear - strange enough - empty quadrats instead of the quotes, for example. Why this happens, and why there are no problems when the rss-feed is subscribed on a computer, this will remain a secret of the deepest depths of the internet, at least for my simple mind.
Leonatus is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Should I go for a replacement? n33raj18 Amazon Kindle 14 08-28-2014 07:18 AM
Replacement Character Frustration amo48 Sigil 4 05-18-2012 12:43 PM
Touch Replacement Plan PeterT Kobo Reader 3 06-18-2011 08:09 PM
regex for character replacement, em-dash questions cybmole Calibre 3 10-18-2010 03:09 PM
PRS-600 So, should I ask for a replacement? ziegl027 Sony Reader 8 01-25-2010 10:40 AM


All times are GMT -4. The time now is 02:35 PM.


MobileRead.com is a privately owned, operated and funded community.