|
|
#1 |
|
Member
![]() Posts: 11
Karma: 10
Join Date: Aug 2022
Device: Kobo Sage
|
Some books created with News Fetch don’t have text on Kobo Sage
Hi everyone,
I’m having a problem with some ebooks created with News Fetch when I try to read them on my Kobo Sage. The biggest problem is that the article is there, but there is no text for it (though if appears fine if I use Calibre’s viewer; all the text is there). For example, in a The Guardian recipe that I download, everything is fine when I open the KEPUB in the Calibre viewer, but after sending it to my Sage, practically nothing is there (see images attached). Can anyone help with this? (Unfortunately this website is telling me that KEPUB is an invalid file so I can't attach the file here.) Thank you, Paul |
|
|
|
|
|
#2 | |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 81,796
Karma: 150265991
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49,826
Karma: 176799834
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
|
|
|
|
|
|
#4 |
|
Member
![]() Posts: 11
Karma: 10
Join Date: Aug 2022
Device: Kobo Sage
|
|
|
|
|
|
|
#5 | |
|
Member
![]() Posts: 11
Karma: 10
Join Date: Aug 2022
Device: Kobo Sage
|
Quote:
|
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 81,796
Karma: 150265991
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
|
|
|
|
|
#7 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,911
Karma: 29228280
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That file looks like it was created using a custom recipe. Use the builtin guardian recipe and you should be fine.
|
|
|
|
|
|
#8 |
|
Member
![]() Posts: 11
Karma: 10
Join Date: Aug 2022
Device: Kobo Sage
|
Ok thanks. This recipe had been working for many years. It was just the environment, world, and business RSS feeds from the Guardian website (all I wanted to read) that I made just using the basic "New recipe" button in Calibre, then adding the feed URL. I've noticed that many of these simple feeds have stopped downloading well in recent months from other websites too. Is there any workaround for this? Or does it require actually writing out advanced recipe code now?
Last edited by PaulB223; 01-24-2026 at 07:36 AM. |
|
|
|
|
|
#9 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,784
Karma: 80104644
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
Why not just modify the built-in recipe to only include the sections you want?
|
|
|
|
|
|
#10 | |
|
Member
![]() Posts: 11
Karma: 10
Join Date: Aug 2022
Device: Kobo Sage
|
Quote:
Code:
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'
'''
www.guardian.co.uk
'''
from datetime import date
from calibre import random_user_agent
from calibre.web.feeds.news import BasicNewsRecipe
def classes(classes):
q = frozenset(classes.split(' '))
return dict(attrs={
'class': lambda x: x and frozenset(x.split()).intersection(q)})
class Guardian(BasicNewsRecipe):
title = u'The Guardian and The Observer'
is_observer = False
base_url = 'https://www.theguardian.com/uk'
if date.today().weekday() == 6:
is_observer = True
base_url = 'https://www.theguardian.com/observer'
__author__ = 'Kovid Goyal'
language = 'en_GB'
oldest_article = 7
max_articles_per_feed = 100
remove_javascript = True
encoding = 'utf-8'
remove_empty_feeds = True
no_stylesheets = True
remove_attributes = ['style', 'width', 'height']
ignore_duplicate_articles = {'title', 'url'}
timefmt = ' [%a, %d %b %Y]'
remove_tags = [
dict(attrs={'class': lambda x: x and '--twitter' in x}),
dict(attrs={'class': lambda x: x and 'submeta' in x.split()}),
dict(name='gu-island'),
dict(attrs={'data-component': ['share', 'social', 'nav', 'nav2', 'topbar']}),
dict(attrs={'data-link-name': 'block share'}),
dict(attrs={'data-print-layout': 'hide'}),
dict(attrs={'data-spacefinder-type': 'model.dotcomrendering.pageElements.NewsletterSignupBlockElement'}),
dict(id=['dfp-ad--survey', 'sub-nav-root', 'the-caption', 'bannerandheader']),
{'for': 'the-checkbox'},
dict(href=['#maincontent', '#navigation']),
dict(role=['navigation', 'button']),
dict(attrs={'class': lambda x: x and 'inline-expand-image' in x}),
dict(name='a', attrs={'aria-label': lambda x: x and 'Share On' in x}),
dict(name='a', attrs={'class': lambda x: x and 'social__action js-social__action--top' in x}),
dict(name='div', attrs={'id': 'share-count-root'}),
dict(attrs={'class': lambda x: x and 'modern-visible' in x.split()}),
classes('badge-slot reveal-caption__checkbox mobile-only element-rich-link'),
dict(name=['link', 'meta', 'style', 'svg', 'input', 'source', 'noscript', 'button']),
dict(name='img', src=lambda x: x and 'https://sb.scorecardresearch.com/' in x),
]
remove_tags_after = [
classes('content__article-body js-bottom-marker article-body-commercial-selector'),
]
extra_css = '''
img {
max-width: 100% !important;
max-height: 100% !important;
}
a span {
color: #E05E02;
}
figcaption span {
font-size: 0.5em;
color: #6B6B6B;
}
'''
def get_browser(self, *a, **kw):
# This site returns images in JPEG-XR format if the user agent is IE
if not hasattr(self, 'non_ie_ua'):
try:
self.non_ie_ua = random_user_agent(allow_ie=False)
except TypeError:
self.non_ie_ua = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.111 Safari/537.36'
kw['user_agent'] = self.non_ie_ua
br = BasicNewsRecipe.get_browser(self, *a, **kw)
return br
def parse_section(self, section_url):
soup = self.index_to_soup(section_url)
for section in soup.findAll('section'):
articles = []
title = self.tag_to_string(section.find('h2'))
if not title:
continue
self.log('Found section:', title)
for li in section.findAll('li'):
a = li.find('a', attrs={'href': True, 'aria-label': True})
if a:
url = a['href']
if url.startswith('/'):
url = self.base_url.rpartition('/')[0] + url
self.log('\t', a['aria-label'], url)
articles.append({'title': a['aria-label'], 'url': url})
if articles:
yield title, articles
def parse_index(self):
# return [('Test', [{'url':
# 'https://www.theguardian.com/environment/2025/nov/07/if-theres-a-free-alternative-ill-eat-healthily-how-sweden-devised-brilliant-school-meals',
# 'title': 'test'}])]
feeds = list(self.parse_section(self.base_url))
feeds += list(self.parse_section('https://www.theguardian.com/uk/sport'))
return feeds
def preprocess_html(self, soup):
for table in soup.findAll('table'):
if len(table.findAll('tr')) > 20:
table.decompose()
for dateline in soup.findAll(attrs={'data-gu-name': 'dateline'}):
for s in dateline.findAll('summary'):
s.extract()
dateline.name = 'div'
return soup
calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
|
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Kobo Sage vs Forma: is Sage FASTER? Has Sage battery-life improved? | LivresInOz | Kobo Reader | 24 | 04-23-2022 03:04 AM |
| Custom news fetch: Unreadable text | dumanb | Recipes | 7 | 01-23-2016 09:16 PM |
| Editing e-books created by news feeds | mgreis | Editor | 2 | 08-25-2014 11:11 AM |
| Mobi books created with Calibre don't show covers in Kindle for Mac | MelBr | Calibre | 3 | 03-31-2013 01:56 PM |
| Please help : get books & fetch news??? | mahmoudfelfel | Calibre | 4 | 08-12-2011 05:53 AM |