View Single Post
Old 10-27-2010, 03:35 AM   #1
BlonG
Member
BlonG began at the beginning.
 
BlonG's Avatar
 
Posts: 15
Karma: 10
Join Date: Oct 2010
Location: Slovenia
Device: Kindle 3G
Please help to clean-up recipe

As a newbie I try to learn how to create recipe by following examples in Calibre User manual.

For creating recipe from RSS – in order to get full article and not just summary – I should use Print version URL (in manual is example for “bbc.co.uk”). I have a problem that I can’t get the URL to full article, because the link is “javascript:window.print()”.

So, I tried different approach - by removing and keeping certain tags.
The problem is that now I don’t get the articles from specific section (each section has its own RSS URL). Articles are divided in sections, but they are all the same in different sections.

The recepit is here:
Spoiler:
#!/usr/bin/env python

__license__ = 'GPL v3'
__copyright__ = '2010'
'''
dnevnik.si
'''

from calibre.ebooks.BeautifulSoup import BeautifulSoup
from calibre.web.feeds.news import BasicNewsRecipe

class Dnevnik(BasicNewsRecipe):
title = u'Dnevnik.si'
__author__ = 'Test'
description = 'News'
oldest_article = 5
max_articles_per_feed = 20
no_stylesheets = True
use_embedded_content = False

cover_url = 'http://www.dnevnik.si/dsg/dnevnik.si.gif'

keep_only_tags = [dict(name='div' , attrs={'id':['content', 'heading']})]

remove_tags = [
dict(name='div' , attrs={'id':'header' })
,dict(name='div' , attrs={'class':['related', 'tools', 'inside' ]})
,dict(name='dl' ,attrs={'class':'ad'})
]

remove_tags_after = [dict(id='_iprom_inStream')]


feeds = [
(u'Izpostavljene novice' , u'http://www.dnevnik.si/rss/?articleType=9')
,(u'Slovenija' , u'http://www.dnevnik.si/rss/?articleType=13')
,(u'Svet' , u'http://www.dnevnik.si/rss/?articleType=14')
,(u'Kronika' , u'http://www.dnevnik.si/rss/?articleType=15')
,(u'Pop/kultura' , u'http://www.dnevnik.si/rss/?articleType=17')
,(u'Zdravje', u'http://www.dnevnik.si/rss/?articleType=18')
]


Link to Sections (RSS URL’s): http://www.dnevnik.si/kaj_je_rss
RSS link to specific section: http://www.dnevnik.si/rss/?articleTy...icleSection=14
Article link: http://www.dnevnik.si/novice/svet/1042398632
Print link (label “Natisni”): javascript:window.print()

Well, if this can be done without "remove" and "keep" tags - by using full article URL from "javascript" command, that would be perhaps better (and easier).

Another think: I still look for some kind expert to create recipe for magazine.
BlonG is offline   Reply With Quote