Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-02-2011, 11:21 AM   #1
chewi
Member
chewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-books
 
chewi's Avatar
 
Posts: 14
Karma: 822
Join Date: Nov 2010
Device: sony prs-650
Arrow RBC.ru recipe

Hello.

I've did recipe for RBC.ru:
Spoiler:
class AdvancedUserRecipe1286819935(BasicNewsRecipe):
title = u'RBC.ru'
__author__ = 'A. Chewi'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
conversion_options = {'linearize_tables' : True}
remove_attributes = ['style']
language = 'ru'
timefmt = ' [%a, %d %b, %Y]'

keep_only_tags = [dict(name='h2', attrs={}),
dict(name='div', attrs={'class': 'box _ga1_on_'}),
dict(name='h1', attrs={'class': 'news_section'}),
dict(name='div', attrs={'class': 'news_body dotted_border_bottom'}),
dict(name='table', attrs={'class': 'newsBody'}),
dict(name='h2', attrs={'class': 'black'})]

feeds = [(u'Главные новости', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/mainnews.rss'),
(u'Политика', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/politics.rss'),
(u'Экономика', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/economics.rss'),
(u'Общество', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/society.rss'),
(u'Происшествия', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/incidents.rss'),
(u'Финансовые новости Quote.rbc.ru', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/quote.ru/mainnews.rss')]


remove_tags = [dict(name='div', attrs={'class': "video-frame"}),
dict(name='div', attrs={'class': "photo-container videoContainer videoSWFLinks videoPreviewSlideContainer notes"}),
dict(name='div', attrs={'class': "notes"}),
dict(name='div', attrs={'class': "publinks"}),
dict(name='a', attrs={'class': "print"}),
dict(name='div', attrs={'class': "photo-report_new notes newslider"}),
dict(name='div', attrs={'class': "videoContainer"}),
dict(name='div', attrs={'class': "videoPreviewSlideContainer"}),
dict(name='a', attrs={'class': "videoPreviewContainer"}),
dict(name='a', attrs={'class': "red"}),]

def preprocess_html(self, soup):
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup

def print_version(self, url):
return url + '?print=true'

It works good enough, but maybe experts can bring any remarks or offers about the code?
Thanks.
chewi is offline   Reply With Quote
Old 03-14-2011, 08:05 AM   #2
chewi
Member
chewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-bookschewi has learned how to read e-books
 
chewi's Avatar
 
Posts: 14
Karma: 822
Join Date: Nov 2010
Device: sony prs-650
Post

Here's updated recipe for RBC.ru (added cover image, description and some other tiny changes)


Spoiler:
Here's updated recipe for RBC.ru (added cover image, description and some other tiny changes)
# -*- coding: utf-8 -*-

from calibre.web.feeds.news import BasicNewsRecipe

class RBC_ru(BasicNewsRecipe):
title = u'RBC.ru'
__author__ = 'A. Chewi'
description = 'Российское информационное агентство «РосБизнесКонсалтинг» (РБК) - ленты новостей политики, экономики и финансов, аналитические материалы, комментарии и прогнозы, тематические статьи'
needs_subscription = False
cover_url = 'http://pics.rbc.ru/img/fp_v4/skin/img/logo.gif'
cover_margins = (80, 160, '#ffffff')
oldest_article = 10
max_articles_per_feed = 50
summary_length = 200
remove_empty_feeds = True
no_stylesheets = True
remove_javascript = True
use_embedded_content = False
conversion_options = {'linearize_tables' : True}
language = 'ru'
timefmt = ' [%a, %d %b, %Y]'

feeds = [(u'Главные новости', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/mainnews.rss'),
(u'Политика', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/politics.rss'),
(u'Экономика', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/economics.rss'),
(u'Общество', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/society.rss'),
(u'Происшествия', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/incidents.rss'),
(u'Финансовые новости Quote.rbc.ru', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/quote.ru/mainnews.rss')]

keep_only_tags = [dict(name='h2', attrs={}),
dict(name='div', attrs={'class': 'box _ga1_on_'}),
dict(name='h1', attrs={'class': 'news_section'}),
dict(name='div', attrs={'class': 'news_body dotted_border_bottom'}),
dict(name='table', attrs={'class': 'newsBody'}),
dict(name='h2', attrs={'class': 'black'})]

remove_tags = [dict(name='div', attrs={'class': "video-frame"}),
dict(name='div', attrs={'class': "photo-container videoContainer videoSWFLinks videoPreviewSlideContainer notes"}),
dict(name='div', attrs={'class': "notes"}),
dict(name='div', attrs={'class': "publinks"}),
dict(name='a', attrs={'class': "print"}),
dict(name='div', attrs={'class': "photo-report_new notes newslider"}),
dict(name='div', attrs={'class': "videoContainer"}),
dict(name='div', attrs={'class': "videoPreviewSlideContainer"}),
dict(name='a', attrs={'class': "videoPreviewContainer"}),
dict(name='a', attrs={'class': "red"}),]

def preprocess_html(self, soup):
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup

def print_version(self, url):
return url + '?print=true'
Attached Files
File Type: zip rbc_ru.zip (1.6 KB, 129 views)
chewi is offline   Reply With Quote
Advert
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 04:57 AM
Recipes for news.tut.by and rbc.ru: help plz chewi Recipes 0 02-21-2011 06:19 AM


All times are GMT -4. The time now is 08:32 PM.


MobileRead.com is a privately owned, operated and funded community.