View Single Post
Old 07-26-2022, 09:28 AM   #13
bugmen00t
Connoisseur
bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!bugmen00t rocks like Gibraltar!
 
bugmen00t's Avatar
 
Posts: 82
Karma: 100000
Join Date: Aug 2015
Device: Kindle Keyboard 3G + Kindle Voyage WiFi + Kindle PW11 Kids WiFi
New recipes (part 04 of ??)

NEW RUSSIAN RECIPES

Черта: media about prevention of discrimantion, social inequality & violence. Favicon.
Spoiler:
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8

from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class Cherta(BasicNewsRecipe):
    title          		  = '\u0427\u0435\u0440\u0442\u0430'
    __author__            = 'bugmen00t'
    description           = ' \u0418\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B\u0435, \u0432\u0430\u0436\u043D\u044B\u0435 \u0438 \u0433\u043B\u0443\u0431\u043E\u043A\u0438\u0435 \u0442\u0435\u043A\u0441\u0442\u044B \u043F\u0440\u043E \u043D\u0430\u0441\u0438\u043B\u0438\u0435 \u0438 \u043D\u0435\u0440\u0430\u0432\u0435\u043D\u0441\u0442\u0432\u043E \u0432 \u0420\u043E\u0441\u0441\u0438\u0438.'
    publisher             = 'cherta.media'
    category              = 'blog'
    cover_url = u'https://cherta.media/wp-content/uploads/2022/01/cherta_snippet2.png'
    language              = 'ru'
    no_stylesheets        = False
    remove_javascript = False
    auto_cleanup   = False
    oldest_article = 30
    max_articles_per_feed = 30

    remove_tags_before = dict(name='div', attrs={'class':'single-story'})
    
    remove_tags_after = dict(name='div', attrs={'class':'single-page__footer-info'})

    remove_tags =   [
        dict(name='div', attrs={'class': 'single-content-link'}),
        dict(name='div', attrs={'class': 'single-page__footer-info_links clearfix'}),
        dict(name='div', attrs={'class': 'single-article-tags-wrapper'})
        ] 

    feeds = [
        ('\u0418\u0441\u0442\u043E\u0440\u0438\u0438', 'https://cherta.media/story/feed/'),
        ('\u0418\u043D\u0442\u0435\u0440\u0432\u044C\u044E', 'https://cherta.media/interview/feed/')
    ]


Горький: collective blog with book reviews, essays, author interviews etc. Favicon.
Fixes needed:
  • No header images in some articles
Spoiler:
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8

from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class Gorky(BasicNewsRecipe):
    title          		  = '\u0413\u043E\u0440\u044C\u043A\u0438\u0439'
    __author__            = 'bugmen00t'
    description           = '\u041D\u0435\u043A\u043E\u043C\u043C\u0435\u0440\u0447\u0435\u0441\u043A\u0438\u0439 \u043F\u0440\u043E\u0435\u043A\u0442 \u043E \u043A\u043D\u0438\u0433\u0430\u0445 \u0438 \u0447\u0442\u0435\u043D\u0438\u0438.'
    publisher             = '\u0410\u041D\u041E "\u0426\u0435\u043D\u0442\u0440 \u043F\u043E \u0441\u043E\u0434\u0435\u0439\u0441\u0442\u0432\u0438\u044E \u0440\u0430\u0437\u0432\u0438\u0442\u0438\u044F \u043A\u0443\u043B\u044C\u0442\u0443\u0440\u044B \u0447\u0442\u0435\u043D\u0438\u044F \u0438 \u043A\u043D\u0438\u0433\u043E\u0438\u0437\u0434\u0430\u043D\u0438\u044F \u00AB\u0413\u043E\u0440\u044C\u043A\u0438\u0439 \u041C\u0435\u0434\u0438\u0430\u00BB"'
    category              = 'blog'
    cover_url = u'https://gorky.media/wp-content/uploads/2016/09/gorky.png'
    language              = 'ru'
    no_stylesheets        = False
    remove_javascript = False
    auto_cleanup   = False
    oldest_article = 30
    max_articles_per_feed = 30

    remove_tags_before = dict(name='div', attrs={'id': 'td-outer-wrap'})
    
    remove_tags_after = dict(name='footer')

    remove_tags =   [
        dict(name='footer'),
        dict(name='nav', attrs={'class': 'navbar'}),
        dict(name='div', attrs={'class': 'hide'}),
        dict(name='div', attrs={'class': 'nav-new'}),
        dict(name='div', attrs={'class': 'top-panel '}),
        dict(name='div', attrs={'class': 'panel-nav'}),
        dict(name='div', attrs={'class': 'panel-nav _hide'}),
        dict(name='ul', attrs={'class': 'top-panel__bottom buttons-list _share'}),
        dict(name='ul', attrs={'class': 'buttons-list _share d_lg-none'})
        ] 

    feeds = [
        ('\u0420\u0435\u0446\u0435\u043D\u0437\u0438\u0438', 'https://gorky.media/reviews/feed/'),
        ('\u0424\u0440\u0430\u0433\u043C\u0435\u043D\u0442\u044B', 'https://gorky.media/fragments/feed/'),
        ('\u041A\u043E\u043D\u0442\u0435\u043A\u0441\u0442', 'https://gorky.media/context/feed/'),
        ('\u041F\u043E\u0434\u0431\u043E\u0440\u043A\u0438', 'https://gorky.media/books-collection/feed/')
    ]


The New Times: weekly socio-political online magazine. Favicon.
Spoiler:
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8

from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class NewTimes(BasicNewsRecipe):
    title          		  = 'The New Times'
    __author__            = 'bugmen00t'
    description           = ' \u0415\u0436\u0435\u043D\u0435\u0434\u0435\u043B\u044C\u043D\u044B\u0439 \u043E\u0431\u0449\u0435\u0441\u0442\u0432\u0435\u043D\u043D\u043E-\u043F\u043E\u043B\u0438\u0442\u0438\u0447\u0435\u0441\u043A\u0438\u0439 \u0436\u0443\u0440\u043D\u0430\u043B'
    publisher             = 'The New Times'
    category              = 'newspaper'
    cover_url = u'https://newtimes.ru/img/ogimage.png'
    language              = 'ru'
    no_stylesheets        = False
    remove_javascript = False
    auto_cleanup   = False
    oldest_article = 14
    max_articles_per_feed = 150

    remove_tags_before = dict(name='h1')
    
    remove_tags_after = dict(name='div', attrs={'id':'full'})

    feeds = [
        ('\u0421\u0442\u0430\u0442\u044C\u0438', 'https://newtimes.ru/rss/')
    ]


SOVA: Tbilisi-based media about life & society in Georgia. Favicon replacement.
Fixes needed:
  • No secondary text lead and header image in some articles
Spoiler:
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8

from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class Sova(BasicNewsRecipe):
    title          		  = 'SOVA'
    __author__            = 'bugmen00t'
    description           = ' \u0420\u0443\u0441\u0441\u043A\u043E\u044F\u0437\u044B\u0447\u043D\u043E\u0435 \u043D\u0435\u0437\u0430\u0432\u0438\u0441\u0438\u043C\u043E\u0435 \u043E\u043D\u043B\u0430\u0439\u043D-\u0438\u0437\u0434\u0430\u043D\u0438\u0435, \u043E\u0441\u043D\u043E\u0432\u0430\u043D\u043D\u043E\u0435 \u0432 \u0422\u0431\u0438\u043B\u0438\u0441\u0438 \u0432 2016 \u0433\u043E\u0434\u0443 \u0433\u0440\u0443\u043F\u043F\u043E\u0439 \u043F\u0440\u043E\u0444\u0435\u0441\u0441\u0438\u043E\u043D\u0430\u043B\u044C\u043D\u044B\u0445 \u0436\u0443\u0440\u043D\u0430\u043B\u0438\u0441\u0442\u043E\u0432, \u043F\u0440\u0435\u0434\u043E\u0441\u0442\u0430\u0432\u043B\u044F\u044E\u0449\u0435\u0435 \u043A\u0430\u0447\u0435\u0441\u0442\u0432\u0435\u043D\u043D\u0443\u044E \u0438\u043D\u0444\u043E\u0440\u043C\u0430\u0446\u0438\u044E \u043E \u043F\u043E\u043B\u0438\u0442\u0438\u043A\u0435, \u044D\u043A\u043E\u043D\u043E\u043C\u0438\u043A\u0435 \u0438 \u0434\u0440\u0443\u0433\u0438\u0445 \u043D\u0435\u043E\u0442\u044A\u0435\u043C\u043B\u0435\u043C\u044B\u0445 \u0430\u0441\u043F\u0435\u043A\u0442\u0430\u0445 \u0436\u0438\u0437\u043D\u0438 \u0441\u043E\u0432\u0440\u0435\u043C\u0435\u043D\u043D\u043E\u0433\u043E \u0447\u0435\u043B\u043E\u0432\u0435\u043A\u0430 \u0432 \u0413\u0440\u0443\u0437\u0438\u0438 \u0438 \u0440\u0435\u0433\u0438\u043E\u043D\u0435 \u0432 \u0446\u0435\u043B\u043E\u043C.'
    publisher             = '\u041D\u0435\u043F\u0440\u0430\u0432\u0438\u0442\u0435\u043B\u044C\u0441\u0442\u0432\u0435\u043D\u043D\u0430\u044F \u043E\u0440\u0433\u0430\u043D\u0438\u0437\u0430\u0446\u0438\u044F Sova News'
    category              = 'blog'
    cover_url = u'https://i0.wp.com/sova.news/wp-content/uploads/2021/08/sova@512.png'
    language              = 'ru'
    no_stylesheets        = False
    remove_javascript = False
    auto_cleanup   = False
    oldest_article = 60
    max_articles_per_feed = 30

    remove_tags_before = dict(name='div', attrs={'class':'site-wrapper header-7'})
    
    remove_tags_after = dict(name='div', attrs={'class':'single-body entry-content typography-copy'})

    remove_tags =   [
        dict(name='nav', attrs={'aria-label': 'breadcrumbs'}),
        dict(name='header', attrs={'class': 'site-header site-header--skin-5'}),
        dict(name='footer'),
        dict(name='ins'),
        dict(name='div', attrs={'class': 'entry-interaction__left'}),
        dict(name='div', attrs={'class': 'entry-interaction__right'}),
        dict(name='div', attrs={'id': 'mnmd-sticky-header'}),
        dict(name='div', attrs={'id': 'mnmd-offcanvas-primary'}),
        dict(name='div', attrs={'id': 'mnmd-offcanvas-mobile'}),
        dict(name='div', attrs={'class': 'entry-interaction__right'}),
        dict(name='blockquote', attrs={'class': 'wp-embedded-content'})
        ] 

    feeds = [
        ('\u041D\u043E\u0432\u043E\u0441\u0442\u0438', 'https://sova.news/news/feed/'),
        ('\u041F\u043E\u043B\u0438\u0442\u0438\u043A\u0430', 'https://sova.news/analytics/politics/feed/'),
        ('\u042D\u043A\u043E\u043D\u043E\u043C\u0438\u043A\u0430', 'https://sova.news/analytics/economy/feed/'),
        ('\u041E\u0431\u0449\u0435\u0441\u0442\u0432\u043E', 'https://sova.news/analytics/society/feed/'),
        ('\u0418\u043D\u0442\u0435\u0440\u0432\u044C\u044E', 'https://sova.news/interview/feed/'),
        ('Unfake', 'https://sova.news/unfake/feed/'),
        ('\u0414\u0440\u0443\u0433\u0430\u044F \u0421\u043E\u0432\u0430', 'https://sova.news/sova-other/feed/'),
        ('\u0418\u0441\u043A\u0443\u0441\u0441\u0442\u0432\u043E', 'https://sova.news/sova-other/art/feed/'),
        ('\u0422\u0443\u0440\u0438\u0437\u043C', 'https://sova.news/sova-other/tourism/feed/'),
        ('#weekendnavigator', 'https://sova.news/weekendnavigator/feed/'),
        ('\u0421\u043E\u0441\u0435\u0434\u0438', 'https://sova.news/sova-other/neighbours/feed/'),
        ('\u041D\u0435\u0434\u0435\u043B\u044F \u0432 \u0433\u043E\u0440\u043E\u0434\u0435', 'https://sova.news/week-in-the-city/feed/'),
        ('\u0424\u043E\u0442\u043E\u043F\u0440\u043E\u0433\u0443\u043B\u043A\u0438', 'https://sova.news/photowalks/feed/'),
        ('\u0424\u043E\u0442\u043E', 'https://sova.news/photo/feed/')
    ]
    
    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs={'data-src': True}):
            img['src'] = img['data-src']
        return soup


StopGame: gaming community portal with news, reviews, streams and blogs. Favicon.
Spoiler:
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8

from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class StopGame(BasicNewsRecipe):
    title          		  = 'StopGame'
    __author__            = 'bugmen00t'
    description           = ' \u0420\u043E\u0441\u0441\u0438\u0439\u0441\u043A\u0438\u0439 \u0438\u0433\u0440\u043E\u0432\u043E\u0439 \u0438\u043D\u0442\u0435\u0440\u043D\u0435\u0442-\u043F\u043E\u0440\u0442\u0430\u043B, \u043D\u0430 \u043A\u043E\u0442\u043E\u0440\u043E\u043C \u043A\u043E\u043B\u043B\u0435\u043A\u0442\u0438\u0432 \u0430\u0432\u0442\u043E\u0440\u043E\u0432 \u0440\u0430\u0441\u0441\u043A\u0430\u0437\u044B\u0432\u0430\u0435\u0442 \u0432\u0441\u0435\u043C \u0436\u0435\u043B\u0430\u044E\u0449\u0438\u043C \u043E \u0432\u0438\u0434\u0435\u043E\u0438\u0433\u0440\u0430\u0445. '
    publisher             = 'StopGame.ru'
    category              = 'blog'
    cover_url = u'https://images.stopgame.ru/blogs/2020/01/29/U7R7t5rQ.jpg'
    language              = 'ru'
    no_stylesheets        = False
    remove_javascript = False
    auto_cleanup   = False
    oldest_article = 7
    max_articles_per_feed = 50

    remove_tags_before = dict(name='h1')
    
    remove_tags_after = dict(name='div', attrs={'class': '_end-info_zp673_1113'})

    remove_tags =   [
        dict(name='section', attrs={'id': 'comments'}),
        dict(name='footer'),
        dict(name='section', attrs={'class': '_page-section_xdzdd_387 _additional-reads_zp673_1348'})
        ] 

    feeds = [
        ('\u0412\u0441\u0435 \u0440\u0430\u0437\u0434\u0435\u043B\u044B', 'https://rss.stopgame.ru/rss_all.xml'),
        ('\u041D\u043E\u0432\u043E\u0441\u0442\u0438', 'https://rss.stopgame.ru/rss_news.xml'),
        ('\u0421\u0442\u0430\u0442\u044C\u0438', 'https://rss.stopgame.ru/articles.xml'),
        ('\u0412\u0438\u0434\u0435\u043E', 'https://rss.stopgame.ru/videos.xml'),
        ('\u0411\u043B\u043E\u0433\u0438', 'https://rss.stopgame.ru/all_topics.xml')
    ]



Also uploading recipes' code & icon as separate files, maybe it will be easier for Dr. Goyal to add them into Calibre that way.
Attached Images
       
Attached Files
File Type: recipe kholod_en.recipe (1.2 KB, 468 views)
File Type: recipe interfax_ua.recipe (2.1 KB, 488 views)
File Type: recipe cherta.recipe (1.7 KB, 475 views)
File Type: recipe gorky.recipe (2.4 KB, 488 views)
File Type: recipe newtimes.recipe (1.2 KB, 481 views)
File Type: recipe sova.recipe (4.6 KB, 480 views)
File Type: recipe stopgame.recipe (2.0 KB, 487 views)
bugmen00t is offline   Reply With Quote