Some books created with News Fetch don’t have text on Kobo Sage

PaulB223 · 01-22-2026, 02:04 PM

Hi everyone,

I’m having a problem with some ebooks created with News Fetch when I try to read them on my Kobo Sage. The biggest problem is that the article is there, but there is no text for it (though if appears fine if I use Calibre’s viewer; all the text is there). For example, in a The Guardian recipe that I download, everything is fine when I open the KEPUB in the Calibre viewer, but after sending it to my Sage, practically nothing is there (see images attached). Can anyone help with this? (Unfortunately this website is telling me that KEPUB is an invalid file so I can't attach the file here.)

Thank you,
Paul

JSWolf · 01-22-2026, 02:26 PM

Quote:

Originally Posted by PaulB223

Hi everyone,

I’m having a problem with some ebooks created with News Fetch when I try to read them on my Kobo Sage. The biggest problem is that the article is there, but there is no text for it (though if appears fine if I use Calibre’s viewer; all the text is there). For example, in a The Guardian recipe that I download, everything is fine when I open the KEPUB in the Calibre viewer, but after sending it to my Sage, practically nothing is there (see images attached). Can anyone help with this? (Unfortunately this website is telling me that KEPUB is an invalid file so I can't attach the file here.)

Thank you,
Paul

Are you reading this as ePub or KePub? If it's ePub, try KePub. If it's KePub, try ePub.

DNSB · 01-22-2026, 04:19 PM

Quote:

Originally Posted by PaulB223

(Unfortunately this website is telling me that KEPUB is an invalid file so I can't attach the file here.)

Thank you,
Paul

Rename it to filename.kepub.epub which how it will end up on a Kobo ereader.

PaulB223 · 01-23-2026, 09:02 AM

Quote:

Originally Posted by JSWolf

Are you reading this as ePub or KePub? If it's ePub, try KePub. If it's KePub, try ePub.

I've tried now with Epub too, and I get the same thing

PaulB223 · 01-23-2026, 09:06 AM

Quote:

Rename it to filename.kepub.epub which how it will end up on a Kobo ereader.

Ok thanks, I was able to upload it now

JSWolf · 01-23-2026, 09:27 AM

Quote:

Originally Posted by PaulB223

Ok thanks, I was able to upload it now

This is a mess. It'd chock full of errors. Plus it has many links to content outside of the eBook. I'd say just forget it.

kovidgoyal · 01-23-2026, 11:48 AM

That file looks like it was created using a custom recipe. Use the builtin guardian recipe and you should be fine.

PaulB223 · 01-24-2026, 06:33 AM

Quote:

Originally Posted by kovidgoyal

That file looks like it was created using a custom recipe. Use the builtin guardian recipe and you should be fine.

Ok thanks. This recipe had been working for many years. It was just the environment, world, and business RSS feeds from the Guardian website (all I wanted to read) that I made just using the basic "New recipe" button in Calibre, then adding the feed URL. I've noticed that many of these simple feeds have stopped downloading well in recent months from other websites too. Is there any workaround for this? Or does it require actually writing out advanced recipe code now?

PeterT · 01-24-2026, 11:12 AM

Why not just modify the built-in recipe to only include the sections you want?

PaulB223 · 01-25-2026, 06:30 AM

Quote:

Originally Posted by PeterT

Why not just modify the built-in recipe to only include the sections you want?

Well haha I'm not good enough to be able to do that. Any tips as for where I should insert those feeds? TIA

Code:

#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'

'''
www.guardian.co.uk
'''
from datetime import date

from calibre import random_user_agent
from calibre.web.feeds.news import BasicNewsRecipe


def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={
        'class': lambda x: x and frozenset(x.split()).intersection(q)})


class Guardian(BasicNewsRecipe):

    title = u'The Guardian and The Observer'
    is_observer = False
    base_url = 'https://www.theguardian.com/uk'
    if date.today().weekday() == 6:
        is_observer = True
        base_url = 'https://www.theguardian.com/observer'

    __author__ = 'Kovid Goyal'
    language = 'en_GB'

    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript = True
    encoding = 'utf-8'
    remove_empty_feeds = True
    no_stylesheets = True
    remove_attributes = ['style', 'width', 'height']
    ignore_duplicate_articles = {'title', 'url'}

    timefmt = ' [%a, %d %b %Y]'

    remove_tags = [
        dict(attrs={'class': lambda x: x and '--twitter' in x}),
        dict(attrs={'class': lambda x: x and 'submeta' in x.split()}),
        dict(name='gu-island'),
        dict(attrs={'data-component': ['share', 'social', 'nav', 'nav2', 'topbar']}),
        dict(attrs={'data-link-name': 'block share'}),
        dict(attrs={'data-print-layout': 'hide'}),
        dict(attrs={'data-spacefinder-type': 'model.dotcomrendering.pageElements.NewsletterSignupBlockElement'}),
        dict(id=['dfp-ad--survey', 'sub-nav-root', 'the-caption', 'bannerandheader']),
        {'for': 'the-checkbox'},
        dict(href=['#maincontent', '#navigation']),
        dict(role=['navigation', 'button']),
        dict(attrs={'class': lambda x: x and 'inline-expand-image' in x}),
        dict(name='a', attrs={'aria-label': lambda x: x and 'Share On' in x}),
        dict(name='a', attrs={'class': lambda x: x and 'social__action js-social__action--top' in x}),
        dict(name='div', attrs={'id': 'share-count-root'}),
        dict(attrs={'class': lambda x: x and 'modern-visible' in x.split()}),
        classes('badge-slot reveal-caption__checkbox mobile-only element-rich-link'),
        dict(name=['link', 'meta', 'style', 'svg', 'input', 'source', 'noscript', 'button']),
        dict(name='img', src=lambda x: x and 'https://sb.scorecardresearch.com/' in x),
    ]
    remove_tags_after = [
        classes('content__article-body js-bottom-marker article-body-commercial-selector'),
    ]

    extra_css = '''
            img {
                max-width: 100% !important;
                max-height: 100% !important;
            }

            a span {
                color: #E05E02;
            }

            figcaption span {
                font-size: 0.5em;
                color: #6B6B6B;
            }
        '''

    def get_browser(self, *a, **kw):
        # This site returns images in JPEG-XR format if the user agent is IE
        if not hasattr(self, 'non_ie_ua'):
            try:
                self.non_ie_ua = random_user_agent(allow_ie=False)
            except TypeError:
                self.non_ie_ua = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.111 Safari/537.36'
        kw['user_agent'] = self.non_ie_ua
        br = BasicNewsRecipe.get_browser(self, *a, **kw)
        return br

    def parse_section(self, section_url):
        soup = self.index_to_soup(section_url)
        for section in soup.findAll('section'):
            articles = []
            title = self.tag_to_string(section.find('h2'))
            if not title:
                continue
            self.log('Found section:', title)
            for li in section.findAll('li'):
                a = li.find('a', attrs={'href': True, 'aria-label': True})
                if a:
                    url = a['href']
                    if url.startswith('/'):
                        url = self.base_url.rpartition('/')[0] + url
                    self.log('\t', a['aria-label'], url)
                    articles.append({'title': a['aria-label'], 'url': url})
            if articles:
                yield title, articles

    def parse_index(self):
        # return [('Test', [{'url':
        #     'https://www.theguardian.com/environment/2025/nov/07/if-theres-a-free-alternative-ill-eat-healthily-how-sweden-devised-brilliant-school-meals',
        #     'title': 'test'}])]
        feeds = list(self.parse_section(self.base_url))
        feeds += list(self.parse_section('https://www.theguardian.com/uk/sport'))
        return feeds

    def preprocess_html(self, soup):
        for table in soup.findAll('table'):
            if len(table.findAll('tr')) > 20:
                table.decompose()
        for dateline in soup.findAll(attrs={'data-gu-name': 'dateline'}):
            for s in dateline.findAll('summary'):
                s.extract()
            dateline.name = 'div'
        return soup


calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'

PeterT · 01-26-2026, 04:33 PM

tried just replacing the code in parse_index by

Code:

    def parse_index(self):
        feeds = list(self.parse_section('https://www.theguardian.com/uk/environment'))
        feeds += list(self.parse_section('https://www.theguardian.com/world'))
        feeds += list(self.parse_section('https://www.theguardian.com/uk/business'))

        return feeds

and it seems to work (but I didn't send it to a Kobo).

the calibre editor does however report quite a few errors.

JSWolf · 01-27-2026, 01:12 PM

Quote:

Originally Posted by PeterT

tried just replacing the code in parse_index by

Code:

    def parse_index(self):
        feeds = list(self.parse_section('https://www.theguardian.com/uk/environment'))
        feeds += list(self.parse_section('https://www.theguardian.com/world'))
        feeds += list(self.parse_section('https://www.theguardian.com/uk/business'))

        return feeds

and it seems to work (but I didn't send it to a Kobo).

the calibre editor does however report quite a few errors.

I think the problem is the really poor code. Sure it may work on your average browser, but on 8programs made to display ePub, probably not.

PaulB223 · 01-28-2026, 11:34 AM

Quote:

Originally Posted by PeterT

tried just replacing the code in parse_index by

Code:

    def parse_index(self):
        feeds = list(self.parse_section('https://www.theguardian.com/uk/environment'))
        feeds += list(self.parse_section('https://www.theguardian.com/world'))
        feeds += list(self.parse_section('https://www.theguardian.com/uk/business'))

        return feeds

and it seems to work (but I didn't send it to a Kobo).

the calibre editor does however report quite a few errors.

Thanks, this seems to work. Any idea how I can transliterate this to other websites too where I'm having the same problem? for example
http://truth-out.org/feed?format=feed
http://www.wsws.org/rss/en.xml

PeterT · 01-28-2026, 12:40 PM

Unfortunately, calibre supplied recipes vary in how they work. Some have customizations directly available within the Advanced options of a recipe; others mods need to be made in the recipes code.

In this case I looked at the calibre supplied recipe and saw some code called parse_index that looked promising. I then looked at the Guardian site and looked at where it's tabs for various sections went and just tried adding those directly to parse_index.

If you're basing your recipes on RSS feeds there are several RSS finder extensions available for Chrome that help you identify RSS feeds associated with a web site which might give you a hand.

Sent from my TB350FU using Tapatalk

PaulB223 · 01-29-2026, 05:47 AM

Thanks yeah I have gotten all my RSS feeds from Feedly, and for years and years they were all working fine with the basic Calibre "New recipe" function where you just add the url of the RSS feed, without having to finagle the code inside. Recently though it has been getting more problematic unfortunately

01-22-2026, 02:04 PM	#1
PaulB223 Member Posts: 15 Karma: 10 Join Date: Aug 2022 Device: Kobo Sage	Some books created with News Fetch don’t have text on Kobo Sage Hi everyone, I’m having a problem with some ebooks created with News Fetch when I try to read them on my Kobo Sage. The biggest problem is that the article is there, but there is no text for it (though if appears fine if I use Calibre’s viewer; all the text is there). For example, in a The Guardian recipe that I download, everything is fine when I open the KEPUB in the Calibre viewer, but after sending it to my Sage, practically nothing is there (see images attached). Can anyone help with this? (Unfortunately this website is telling me that KEPUB is an invalid file so I can't attach the file here.) Thank you, Paul Attached Thumbnails

01-26-2026, 04:33 PM	#11
PeterT Grand Sorcerer Posts: 13,849 Karma: 80512826 Join Date: Nov 2007 Location: Toronto Device: Libra H2O, Libra Colour	tried just replacing the code in parse_index by Code: def parse_index(self): feeds = list(self.parse_section('https://www.theguardian.com/uk/environment')) feeds += list(self.parse_section('https://www.theguardian.com/world')) feeds += list(self.parse_section('https://www.theguardian.com/uk/business')) return feeds and it seems to work (but I didn't send it to a Kobo). the calibre editor does however report quite a few errors.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Kobo Sage vs Forma: is Sage FASTER? Has Sage battery-life improved?	LivresInOz	Kobo Reader	24	04-23-2022 02:04 AM
Custom news fetch: Unreadable text	dumanb	Recipes	7	01-23-2016 08:16 PM
Editing e-books created by news feeds	mgreis	Editor	2	08-25-2014 10:11 AM
Mobi books created with Calibre don't show covers in Kindle for Mac	MelBr	Calibre	3	03-31-2013 12:56 PM
Please help : get books & fetch news???	mahmoudfelfel	Calibre	4	08-12-2011 04:53 AM

01-23-2026, 11:48 AM	#7
kovidgoyal creator of calibre Posts: 46,070 Karma: 29579912 Join Date: Oct 2006 Location: Mumbai, India Device: Various	That file looks like it was created using a custom recipe. Use the builtin guardian recipe and you should be fine.

01-24-2026, 11:12 AM	#9
PeterT Grand Sorcerer Posts: 13,849 Karma: 80512826 Join Date: Nov 2007 Location: Toronto Device: Libra H2O, Libra Colour	Why not just modify the built-in recipe to only include the sections you want?

01-28-2026, 12:40 PM	#14
PeterT Grand Sorcerer Posts: 13,849 Karma: 80512826 Join Date: Nov 2007 Location: Toronto Device: Libra H2O, Libra Colour	Unfortunately, calibre supplied recipes vary in how they work. Some have customizations directly available within the Advanced options of a recipe; others mods need to be made in the recipes code. In this case I looked at the calibre supplied recipe and saw some code called parse_index that looked promising. I then looked at the Guardian site and looked at where it's tabs for various sections went and just tried adding those directly to parse_index. If you're basing your recipes on RSS feeds there are several RSS finder extensions available for Chrome that help you identify RSS feeds associated with a web site which might give you a hand. Sent from my TB350FU using Tapatalk

01-29-2026, 05:47 AM	#15
PaulB223 Member Posts: 15 Karma: 10 Join Date: Aug 2022 Device: Kobo Sage	Thanks yeah I have gotten all my RSS feeds from Feedly, and for years and years they were all working fine with the basic Calibre "New recipe" function where you just add the url of the RSS feed, without having to finagle the code inside. Recently though it has been getting more problematic unfortunately

Advert

Advert