View Single Post
Old 02-03-2014, 03:16 AM   #3
jennie
Member
jennie began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Jun 2010
Device: kindle 3
Hi Kovid, thanks a lot for your reply.

I still can't get the cover to work. Here's my latest code:
Spoiler:
Code:
from calibre.web.feeds.recipes import BasicNewsRecipe

class Kathimerini(BasicNewsRecipe):
    title                  = 'Kathimerini'
    __author__             = 'jenniepet'
    description            = 'News from Greece'
    max_articles_per_feed  = 100
    oldest_article         = 2
    publisher              = 'Kathimerini'
    category               = 'news, GR'
    language               = 'el'
    encoding               = 'utf-8'
    conversion_options     = { 'linearize_tables': True}
    no_stylesheets         = True
    remove_tags_before     = dict(id='site-body')
    remove_tags_after      = [dict(id='social')]
    remove_tags            = [dict(attrs={'class':['clearing-featured-img', 'post-tools', 'edition edition_PRINT']})]
    feeds = [(u'1','http://www.kathimerini.gr/rss?i=news.el.search&q=&t=0&w=&c=&s=p&type=&edition=PRINT&author=0&fromDate=0&toDate=0'), 
(u'2','http://www.kathimerini.gr/rss?i=news.el.search&q=&t=0&w=&c=&s=p&type=&edition=PRINT&author=0&fromDate=0&toDate=0&page=1'), 
(u'3','http://www.kathimerini.gr/rss?i=news.el.search&q=&t=0&w=&c=&s=p&type=&edition=PRINT&author=0&fromDate=0&toDate=0&page=2'), 
(u'4','http://www.kathimerini.gr/rss?i=news.el.search&q=&t=0&w=&c=&s=p&type=&edition=PRINT&author=0&fromDate=0&toDate=0&page=3'), 
(u'5','http://www.kathimerini.gr/rss?i=news.el.search&q=&t=0&w=&c=&s=p&type=&edition=PRINT&author=0&fromDate=0&toDate=0&page=4')]

def get_cover_url(self):
     import time
     return 'http://s.kathimerini.gr/resources/issue-cover/02-%s.jpg' %time.strftime('%m-%Y')

I'm using 02 instead of %d for testing purposes, because there is no issue today.

I don't exactly know how to program in any language, so I'm having trouble using the rest of your advice. I don't think I want to try implementing parse_feed at this point, but I did try to read up on preprocess_raw_html, with no tangible results yet.
I'd really appreciate it if you could give me the complete fixed code.
I guess what I need to implement is something in the lines of:
Code:
if body contains the  class "article_SKETCH" (among others)
replace 'class':['clearing-featured-img'] with 'class':['do-not-remove']
jennie is offline   Reply With Quote