10-15-2020, 12:24 AM | #1 |
Member
Posts: 14
Karma: 132
Join Date: Aug 2014
Device: Kindle Paperwhite 7th Gen
|
Creating a recipe for theprint.in
I was creating a recipe for theprint.in, a relatively new but high quality news website from India. I started with a fully automated recipe and then started customizing it.
What I primarily needed was the remove_tags and auto_cleanup_keep functionality. However, while remove_tags got working readily, I'm not able to make the auto_cleanup_keep work. What I'm trying to do here is to keep the name of the author, the publication time, and the subtitle of the post which the auto cleanup algo is removing by default. Can anyone help me make this work. I'm using Calibre version 5.2.0. Here's the recipe: Code:
#!/usr/bin/env python3 # vim:fileencoding=utf-8 from __future__ import unicode_literals, division, absolute_import, print_function from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1600702839(BasicNewsRecipe): title = f'The Print - {time.strftime("%d %b, %Y", time.localtime())}' description = "News from The Print, an independent, digital only media outlet" publication_type = 'newspaper' language = 'en' oldest_article = 2 max_articles_per_feed = 15 auto_cleanup_keep = '//div[@class="td-module-meta-info"]|'\ '//h2[@class="td-post-sub-title"]|'\ '//a[@class="author url fn"]|'\ '//span[@class="update_date"]' auto_cleanup = True ignore_duplicate_articles = {'url'} remove_tags = [dict(name='div', attrs={'class':['post_contribute', 'code-block code-block-11']}), dict(attrs={'class': 'fontsize_Btn'}), dict(name='p', attrs={'class': 'postBtm'}), dict(name='em'), dict(name='hr'), dict(name='button')] feeds = [ ('Politics', 'https://theprint.in/category/politics/feed/'), ('Governance', 'https://theprint.in/category/india/governance/feed/'), ('Economy', 'https://theprint.in/category/economy/feed'), ('India', 'https://theprint.in/category/india/feed'), ('Opinion', 'https://theprint.in/category/opinion/feed'), ('Defence', 'https://theprint.in/category/defence/feed'), ('Science', 'https://theprint.in/category/science/feed/'), ('Tech', 'https://theprint.in/category/tech/feed/'), ('Education', 'https://theprint.in/category/india/education/feed/'), ('National Interest', 'https://theprint.in/category/national-interest/feed/'), ('50-word Edit', 'https://theprint.in/category/50-word-edit/feed/'), ('Ilanomics', 'https://theprint.in/ilanomics/feed/'), ('Diplomacy', 'https://theprint.in/category/diplomacy/feed/'), ('Features', 'https://theprint.in/category/features/feed/') ] |
10-15-2020, 03:36 AM | #2 |
creator of calibre
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There is a builtin recipe for theprint already why not use it?
|
Advert | |
|
10-15-2020, 07:16 AM | #3 |
Member
Posts: 14
Karma: 132
Join Date: Aug 2014
Device: Kindle Paperwhite 7th Gen
|
Trying to get a shorter magazine and cleaner pages. The custom recipe allows me to limit the number of articles I fetch and remove some sections which I'm not interested in.
Also, the built-in recipe leaves the Also read and Subscribe parts, which appear multiple times on the pages. That's why I'm using the remove_tags function. |
10-16-2020, 02:27 AM | #4 |
creator of calibre
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
sorry I almost never use auto_cleanup
|
10-16-2020, 12:50 PM | #5 |
Member
Posts: 14
Karma: 132
Join Date: Aug 2014
Device: Kindle Paperwhite 7th Gen
|
I was able to customize the in-built recipe slightly to get the desired output. Only that it includes all the sections.
|
Advert | |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
creating a recipe for faz.net's e-paper | MayJune | Recipes | 8 | 04-15-2016 05:26 AM |
Creating a Recipe for Engadget Distro? | kichigai | Recipes | 0 | 06-19-2012 04:10 PM |
Need some help creating a login for a recipe | Selcal | Calibre | 5 | 07-30-2010 07:45 AM |
The Secret Recipe for Creating a Romance Book | schroedercl2 | News | 49 | 01-23-2010 02:54 PM |
Creating a Recipe for PS3 Center? | cypherslock | Calibre | 3 | 12-27-2009 09:29 PM |