No more images downloaded

Leonatus · 03-01-2023, 06:10 AM

I'm having a recipe for the german journal "Tagespost", it's this one:

Code:

#!/usr/bin/env python
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
__license__ = 'GPL v3'
__copyright__ = '2020, Pat Stapleton <pat.stapleton at gmail.com>'
'''
Recipe for Die Tagespost
'''
from calibre.web.feeds.news import BasicNewsRecipe


class AdvancedUserRecipe1589629735(BasicNewsRecipe):
    title          = 'Tagespost'
    language       = 'de'
    __author__     = 'Pat Stapleton'
    description = ('Die Tagespost trägt den Untertitel Wochenzeitung für Politik, Gesellschaft'
        ' und Kultur und ist eine überregionale, wöchentlich im Johann Wilhelm Naumann Verlag in Würzburg erscheinende Zeitung.')
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup   = True
    use_embedded_content = False

    feeds          = [
        ('Tagespost', 'https://www.die-tagespost.de/storage/rss/rss/die-tagespost-komplett.xml'),
    ]
    extra_css = 'td.textb {font-size: medium;} * { text-align: justify !important; text-decoration: none !important}'
    remove_attributes = ['href']

calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36'

Everything works well, so far. But when I began to make use of it, some two years ago, there were lots of images downloaded, too. In the current of time, the images became fewer, eventually there has been downloaded only one image, and now, for a few months, none at all.
I'm quite satisfied with the recipe, so my remark is more a question out of interest than a request for support: Are there technical reasons for this behaviour, or am I doing something wrong? - In the current of time, I admit, the content of the journal has considerably augmented: from about 100 pages (on my e-reader) up to 200 and more.

unkn0wn · 03-03-2023, 09:19 AM

auto_cleanup = True in recipe.
https://manual.calibre-ebook.com/new...Recipe.cleanup

you can use keep_only_tags & remove_tags to keep only those html tags you think has text and images and remove unnecessary tags to fix it.

https://manual.calibre-ebook.com/new...keep_only_tags

Leonatus · 03-03-2023, 12:00 PM

@unkn0wn: Thank you, so far! auto_cleanup = True is already in the recipe. Do you mean that I have to change that?
Sorry, I'm no technic at all!

unkn0wn · 03-03-2023, 02:31 PM

Quote:

Originally Posted by Leonatus

I'm quite satisfied with the recipe, so my remark is more a question out of interest than a request for support: Are there technical reasons for this behaviour, or am I doing something wrong?

Yes auto_cleanup is True, which does its own thing, it might ignore images or other tags. You can remove that and use the keep_only_tags to fix it.

I thought you didn't want support. So I pointed you towards documentation.

Leonatus · 03-04-2023, 05:02 AM

Oh, I should well enjoy having my images back!
So, as I understand, I should replace:

Code:

auto_cleanup   = True

by:

Code:

keep_only_tags

I'll give it a try!

Edit: I tried, but there comes an error message: "keep_only_tags should be defined".
Would it be correct to write:

Code:

keep_only_tags = True

?
Sorry again for my ignorance!

unkn0wn · 03-04-2023, 01:30 PM

Code:

'''
Recipe for Die Tagespost
'''
from calibre.web.feeds.news import BasicNewsRecipe

class tagespost(BasicNewsRecipe):
    title          = 'Tagespost'
    language       = 'de'
    __author__     = 'unkn0wn'
    description = ('Die Tagespost trägt den Untertitel Wochenzeitung für Politik, Gesellschaft'
        ' und Kultur und ist eine überregionale, wöchentlich im Johann Wilhelm Naumann Verlag in Würzburg erscheinende Zeitung.')
    oldest_article = 7
    max_articles_per_feed = 100
    use_embedded_content = False
    
    keep_only_tags = [
        dict(name='article', attrs={'class':'art-detail'})
    ]

    feeds          = [
        ('Tagespost', 'https://www.die-tagespost.de/storage/rss/rss/die-tagespost-komplett.xml'),
    ]

Leonatus · 03-05-2023, 05:11 AM

Wonderful! Not only that the images appear again, but the entire layout seems prettier!
So great! Thank you!

03-04-2023, 05:02 AM	#5
Leonatus Wizard Posts: 1,088 Karma: 11502975 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	Oh, I should well enjoy having my images back! So, as I understand, I should replace: Code: auto_cleanup = True by: Code: keep_only_tags I'll give it a try! Edit: I tried, but there comes an error message: "keep_only_tags should be defined". Would it be correct to write: Code: keep_only_tags = True ? Sorry again for my ignorance! Last edited by Leonatus; 03-04-2023 at 05:06 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Recipe fails to download some images due to slow loading of the images	itssudipok	Recipes	2	07-05-2022 02:05 PM
How do you get rid of all images in an ePub file downloaded from Archive.org?	2scre	ePub	9	06-14-2021 10:19 PM
Images not being downloaded - new recipe	masoud77	Recipes	1	09-04-2018 10:43 PM
How to change the Sigil Images folder name to images	davidspring	Sigil	29	02-12-2018 06:00 AM
Images of ChessCafe.com not downloaded	peterle	Recipes	2	05-18-2013 09:12 AM

03-03-2023, 09:19 AM	#2
unkn0wn Guru Posts: 645 Karma: 85520 Join Date: May 2021 Device: kindle	auto_cleanup = True in recipe. https://manual.calibre-ebook.com/new...Recipe.cleanup you can use keep_only_tags & remove_tags to keep only those html tags you think has text and images and remove unnecessary tags to fix it. https://manual.calibre-ebook.com/new...keep_only_tags

03-03-2023, 12:00 PM	#3
Leonatus Wizard Posts: 1,088 Karma: 11502975 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	@unkn0wn: Thank you, so far! auto_cleanup = True is already in the recipe. Do you mean that I have to change that? Sorry, I'm no technic at all!

03-05-2023, 05:11 AM	#7
Leonatus Wizard Posts: 1,088 Karma: 11502975 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	Wonderful! Not only that the images appear again, but the entire layout seems prettier! So great! Thank you!

Advert

Advert