Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-15-2020, 12:24 AM   #1
gourav
Member
gourav doesn't littergourav doesn't litter
 
Posts: 14
Karma: 132
Join Date: Aug 2014
Device: Kindle Paperwhite 7th Gen
Creating a recipe for theprint.in

I was creating a recipe for theprint.in, a relatively new but high quality news website from India. I started with a fully automated recipe and then started customizing it.

What I primarily needed was the remove_tags and auto_cleanup_keep functionality. However, while remove_tags got working readily, I'm not able to make the auto_cleanup_keep work.

What I'm trying to do here is to keep the name of the author, the publication time, and the subtitle of the post which the auto cleanup algo is removing by default. Can anyone help me make this work.

I'm using Calibre version 5.2.0. Here's the recipe:
Code:
#!/usr/bin/env python3
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1600702839(BasicNewsRecipe):
    title          = f'The Print - {time.strftime("%d %b, %Y", time.localtime())}'
    description    = "News from The Print, an independent, digital only media outlet"
    publication_type = 'newspaper'
    language       = 'en'
    oldest_article = 2
    max_articles_per_feed = 15
    auto_cleanup_keep = '//div[@class="td-module-meta-info"]|'\
                        '//h2[@class="td-post-sub-title"]|'\
                        '//a[@class="author url fn"]|'\
                        '//span[@class="update_date"]'
    auto_cleanup   = True
    ignore_duplicate_articles = {'url'}
    
    remove_tags    = [dict(name='div', attrs={'class':['post_contribute', 'code-block code-block-11']}),
                     dict(attrs={'class': 'fontsize_Btn'}),
                     dict(name='p', attrs={'class': 'postBtm'}),
                     dict(name='em'), dict(name='hr'), dict(name='button')]

    feeds          = [
        ('Politics', 'https://theprint.in/category/politics/feed/'),
        ('Governance', 'https://theprint.in/category/india/governance/feed/'),
        ('Economy', 'https://theprint.in/category/economy/feed'),
        ('India', 'https://theprint.in/category/india/feed'),
        ('Opinion', 'https://theprint.in/category/opinion/feed'),
        ('Defence', 'https://theprint.in/category/defence/feed'),
        ('Science', 'https://theprint.in/category/science/feed/'),
        ('Tech', 'https://theprint.in/category/tech/feed/'),
        ('Education', 'https://theprint.in/category/india/education/feed/'),
        ('National Interest', 'https://theprint.in/category/national-interest/feed/'),
        ('50-word Edit', 'https://theprint.in/category/50-word-edit/feed/'),
        ('Ilanomics', 'https://theprint.in/ilanomics/feed/'),
        ('Diplomacy', 'https://theprint.in/category/diplomacy/feed/'),
        ('Features', 'https://theprint.in/category/features/feed/')
    ]
gourav is offline   Reply With Quote
Old 10-15-2020, 03:36 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There is a builtin recipe for theprint already why not use it?
kovidgoyal is online now   Reply With Quote
Advert
Old 10-15-2020, 07:16 AM   #3
gourav
Member
gourav doesn't littergourav doesn't litter
 
Posts: 14
Karma: 132
Join Date: Aug 2014
Device: Kindle Paperwhite 7th Gen
Trying to get a shorter magazine and cleaner pages. The custom recipe allows me to limit the number of articles I fetch and remove some sections which I'm not interested in.

Also, the built-in recipe leaves the Also read and Subscribe parts, which appear multiple times on the pages. That's why I'm using the remove_tags function.
gourav is offline   Reply With Quote
Old 10-16-2020, 02:27 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
sorry I almost never use auto_cleanup
kovidgoyal is online now   Reply With Quote
Old 10-16-2020, 12:50 PM   #5
gourav
Member
gourav doesn't littergourav doesn't litter
 
Posts: 14
Karma: 132
Join Date: Aug 2014
Device: Kindle Paperwhite 7th Gen
I was able to customize the in-built recipe slightly to get the desired output. Only that it includes all the sections.
gourav is offline   Reply With Quote
Advert
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
creating a recipe for faz.net's e-paper MayJune Recipes 8 04-15-2016 05:26 AM
Creating a Recipe for Engadget Distro? kichigai Recipes 0 06-19-2012 04:10 PM
Need some help creating a login for a recipe Selcal Calibre 5 07-30-2010 07:45 AM
The Secret Recipe for Creating a Romance Book schroedercl2 News 49 01-23-2010 02:54 PM
Creating a Recipe for PS3 Center? cypherslock Calibre 3 12-27-2009 09:29 PM


All times are GMT -4. The time now is 12:29 PM.


MobileRead.com is a privately owned, operated and funded community.