Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-15-2011, 08:01 PM   #1
Tegan
Connoisseur
Tegan began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Jan 2011
Device: Kindle 1st Gen, Kindle 3 SO
Classes in CSS in recipes?

I'm playing around with the Huffington Post recipe, trying to get rid of some extra junk I don't want, and I've noticed something that I'm not certain I understand.

When you are indicating a class in CSS, if you put a space between the words, you are saying to apply both words. So class="read_more with_verticals" is actually the same as applying class="read_more" and then applying class="with_verticals".

So if I use remove_tags with "read_more", it should block out class="read_more with_verticals". But what I think I'm seeing is that I have to have the whole thing, including the space, to remove it. Is this correct, or am I screwing up something?
Tegan is offline   Reply With Quote
Old 01-16-2011, 01:20 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,425
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use

{'class':lambda x and 'with_verticals' in x}
kovidgoyal is offline   Reply With Quote
Old 01-17-2011, 05:36 PM   #3
Tegan
Connoisseur
Tegan began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Jan 2011
Device: Kindle 1st Gen, Kindle 3 SO
Ok, here's an example:

Code:
 dict(name='div', attrs={'class':['reaction_pannel_v3 facebookvote_v2 business_vertical_bg_link','reaction_pannel_v3 facebookvote_v2 chicago_vertical_bg_link','reaction_pannel_v3 facebookvote_v2 comedy_vertical_bg_link','reaction_pannel_v3 facebookvote_v2 denver_vertical_bg_link','reaction_pannel_v3 facebookvote_v2 green_vertical_bg_link','reaction_pannel_v3 facebookvote_v2 media_vertical_bg_link','reaction_pannel_v3 facebookvote_v2 politics_vertical_bg_link','reaction_pannel_v3 facebookvote_v2 sports_vertical_bg_link','reaction_pannel_v3 facebookvote_v2 world_vertical_bg_link']}),
I don't know if I've gotten all the variations on this one. But I suspect I haven't. So I want to just be able to write:

Code:
 dict(name='div', attrs={'class':['reaction_pannel_v3']}),
And have it block out all of the above. In the CSS, all of those items are styled by all three classes. So, is it possible to block out a class that is always joined with other classes, even if you don't know all of the other classes it might be joined with?

This is a common problem on the Huffington Post recipe. I've nearly got it trimmed, but new variations keep popping up and my recipe keeps getting longer and longer.
Tegan is offline   Reply With Quote
Old 01-17-2011, 06:04 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,425
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use teh code I posted it will work in your case.
kovidgoyal is offline   Reply With Quote
Old 01-17-2011, 06:09 PM   #5
Tegan
Connoisseur
Tegan began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Jan 2011
Device: Kindle 1st Gen, Kindle 3 SO
Quote:
Originally Posted by kovidgoyal View Post
Use teh code I posted it will work in your case.
I can't make the logical jump.

Would it be

Code:
{'class':lambda x and 'reaction_pannel_v3' in x}
or something else?
Tegan is offline   Reply With Quote
Old 01-17-2011, 06:10 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,425
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yes, that is correct
kovidgoyal is offline   Reply With Quote
Old 01-17-2011, 06:23 PM   #7
Tegan
Connoisseur
Tegan began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Jan 2011
Device: Kindle 1st Gen, Kindle 3 SO
I'm getting invalid syntax when I try to put them into my recipe.

Code:
remove_tags = []
    remove_tags.append(dict(name='div', attrs={'id':['liveblog_heading','liveblog_container','chicklets','sidebar_digg_block']}))
    remove_tags.append(dict(name='div', attrs={'class':lambda x and 'reaction_pannel_v3' in x}))
    remove_tags.append(dict(name='div', attrs={'class':lambda x and 'facebookvote_reaction' in x}))
or

Code:
dict(name='div', attrs={'id':['liveblog_heading','liveblog_container','chicklets','sidebar_digg_block']}),
                      dict(name='div', attrs={'class':lambda x and 'reaction_pannel_v3' in x}),
                      dict(name='div', attrs={'class':lambda x and 'facebookvote_reaction' in x}),
What stupid thing am I doing?
Tegan is offline   Reply With Quote
Old 01-17-2011, 06:28 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,425
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
lambda x: x and

not

lambda x and
kovidgoyal is offline   Reply With Quote
Old 01-17-2011, 06:44 PM   #9
Tegan
Connoisseur
Tegan began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Jan 2011
Device: Kindle 1st Gen, Kindle 3 SO
Quote:
Originally Posted by kovidgoyal View Post
lambda x: x and
Ah, my ignorance is showing.

Ok, it partially worked. Some of them still showed up. Going to see if there's a pattern and figure out why. A little trial and error ought to get this thing working right.
Tegan is offline   Reply With Quote
Old 01-17-2011, 07:04 PM   #10
Tegan
Connoisseur
Tegan began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Jan 2011
Device: Kindle 1st Gen, Kindle 3 SO
Got it! Updated Huffington Post recipe, with less junk showing, and more pure news-like substance:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
import re

class HuffingtonPostRecipe(BasicNewsRecipe):
    __license__  = 'GPL v3'
    __author__ = 'kwetal and Archana Raman'
    language = 'en'
    version = 2

    title          = u'__The Huffington Post'
    publisher      = u'huffingtonpost.com'
    category       = u'News, Politics'
    description    = u'Political Blog'

    oldest_article = 1.1
    max_articles_per_feed = 100

    encoding = 'utf-8'
    remove_empty_feeds = True
    no_stylesheets = True
    remove_javascript = True

    # Feeds from: http://www.huffingtonpost.com/syndication/
    feeds = []
    feeds.append((u'Latest News', u'http://feeds.huffingtonpost.com/huffingtonpost/LatestNews'))

    feeds.append((u'Politics', u'http://www.huffingtonpost.com/feeds/verticals/politics/index.xml'))
    #feeds.append((u'Politics: News', u'http://www.huffingtonpost.com/feeds/verticals/politics/news.xml'))
    #feeds.append((u'Politics: Blog', u'http://www.huffingtonpost.com/feeds/verticals/politics/blog.xml'))

    feeds.append((u'Media', u'http://www.huffingtonpost.com/feeds/verticals/media/index.xml'))
    #feeds.append((u'Media: News', u'http://www.huffingtonpost.com/feeds/verticals/media/news.xml'))
    #feeds.append((u'Media: Blog', u'http://www.huffingtonpost.com/feeds/verticals/media/blog.xml'))

    feeds.append((u'Business', u'http://www.huffingtonpost.com/feeds/verticals/business/index.xml'))
    #feeds.append((u'Business: News', u'http://www.huffingtonpost.com/feeds/verticals/business/news.xml'))
    #feeds.append((u'Business: Blogs', u'http://www.huffingtonpost.com/feeds/verticals/business/blog.xml'))

    feeds.append((u'Entertainment', u'http://www.huffingtonpost.com/feeds/verticals/entertainment/index.xml'))
    #feeds.append((u'Entertainment: News', u'http://www.huffingtonpost.com/feeds/verticals/business/news.xml'))
    #feeds.append((u'Entertainment: Blog', u'http://www.huffingtonpost.com/feeds/verticals/entertainment/blog.xml'))

    feeds.append((u'Living', u'http://www.huffingtonpost.com/feeds/verticals/living/index.xml'))
    #feeds.append((u'Living: News', u'http://www.huffingtonpost.com/feeds/verticals/living/news.xml'))
    #feeds.append((u'Living: Blog', u'http://www.huffingtonpost.com/feeds/verticals/living/blog.xml'))

    feeds.append((u'Style', u'http://www.huffingtonpost.com/feeds/verticals/style/index.xml'))
    #feeds.append((u'Style: News', u'http://www.huffingtonpost.com/feeds/verticals/style/news.xml'))
    #feeds.append((u'Style: Blog', u'http://www.huffingtonpost.com/feeds/verticals/style/blog.xml'))

    feeds.append((u'Green', u'http://www.huffingtonpost.com/feeds/verticals/green/index.xml'))
    #feeds.append((u'Green: News', u'http://www.huffingtonpost.com/feeds/verticals/green/news.xml'))
    #feeds.append((u'Green: Blog', u'http://www.huffingtonpost.com/feeds/verticals/green/blog.xml'))

    feeds.append((u'Technology', u'http://www.huffingtonpost.com/feeds/verticals/technology/index.xml'))
    #feeds.append((u'Technology: News', u'http://www.huffingtonpost.com/feeds/verticals/technology/news.xml'))
    #feeds.append((u'Technology: Blog', u'http://www.huffingtonpost.com/feeds/verticals/technology/blog.xml'))

    feeds.append((u'Comedy', u'http://www.huffingtonpost.com/feeds/verticals/comedy/index.xml'))
    #feeds.append((u'Comedy: News', u'http://www.huffingtonpost.com/feeds/verticals/comedy/news.xml'))
    #feeds.append((u'Comedy: Blog', u'http://www.huffingtonpost.com/feeds/verticals/comedy/blog.xml'))

    feeds.append((u'World', u'http://www.huffingtonpost.com/feeds/verticals/world/index.xml'))
    #feeds.append((u'World: News', u'http://www.huffingtonpost.com/feeds/verticals/world/news.xml'))
    #feeds.append((u'World: Blog', u'http://www.huffingtonpost.com/feeds/verticals/world/blog.xml'))

    feeds.append((u'Original Reporting', u'http://www.huffingtonpost.com/tag/huffpolitics/feed'))
    #feeds.append((u'Original Posts', u'http://www.huffingtonpost.com/feeds/original_posts/index.xml'))

    keep_only_tags = [
                      dict(name='div', attrs={'id':['blog_title']}),
                      dict(name='div', attrs={'class':['col entry_right full','col entry_right full wide_format','comments_datetime v05','entry_body_text','float_left fixed_width_author']})]
    remove_tags    = [
                      dict(name='div', attrs={'id':['liveblog_heading','liveblog_container','chicklets','sidebar_digg_block']}),
                      dict(name='div', attrs={'class':lambda x: x and 'facebookvote_v2' in x}),
                      dict(name='div', attrs={'class':lambda x: x and 'reaction_pannel_v3' in x}),
                      dict(name='div', attrs={'class':lambda x: x and 'facebookvote_reaction' in x}),
                      dict(name='div', attrs={'class':['chicklets','chicklets_bar','hidden','liveblog_entry','reaction_pannel_v3','read_more','share_boxes_box_block_b_wraper','sidebar_share_block','sidebarHeader',]}),
                      dict(name='div', attrs={'class':['facebook-like-box float_left','facebookvote_reaction','facebookvote_v2','liveblog_entry hidden','read_more with_verticals',]}),
                      dict(name='span', attrs={'class':['get_huffpo','email_huffpo']}),
                      dict(name='a', attrs={'class':'home_pixie'}),
                      dict(name=['script', 'noscript', 'style'])]

    extra_css = '''
                    h1{font-family :Arial,Helvetica,sans-serif; font-size:large;}
                    h2{font-family :Arial,Helvetica,sans-serif; font-size:medium; color:#000000;}
                    h3{font-family :Arial,Helvetica,sans-serif; font-size:medium; color:#000000;}
                    body{font-family:verdana,arial,helvetica,geneva,sans-serif ;}
                    .date{color:#858585;font-family:"Times New Roman",sans-serif;}
                    .comments_datetime v05{color:#696969;}
                    .teaser_permalink{font-style:italic;font-size:xx-small;}
                    .blog_posted_date{color:#696969;font-size:xx-small;font-weight: bold;}
                    '''

    def get_article_url(self, article):
        """
            Workaround for Feedparser behaviour. If an item has more than one <link/> element, article.link is empty and
            article.links contains a list of dictionaries.
            Todo: refactor to searching this list to avoid the hardcoded zero-index
        """
        link = article.get('link')
        print("Link:"+link)
        if not link:
            links = article.get('links')
            if links:
                link = links[0]['href']
                if not links[0]['href']:
                    link = links[1]['href']

        return link
I'm going to run it for the next few weeks and see if it needs any further tweaking, but the tests I did look pretty good overall. Thanks for the help and patience, Kovid.
Tegan is offline   Reply With Quote
Old 01-17-2011, 07:30 PM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,425
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You're welcome let me know if/when you feel your recipe is ready to replace the builtin one.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
css pseudo elements and adjacent combinators in extra css? ldolse Calibre 2 12-21-2010 05:09 PM
keeping or removing a div with multiple classes JohnsonZA Recipes 1 09-25-2010 10:33 AM
NCX creation -- nested or not, css classes illustrata ePub 3 08-25-2010 08:56 AM
Philosophy London, Jack: War of the Classes. 10 Feb 2009 RWood IMP Books 0 02-10-2009 08:46 PM
Philosophy London, Jack: War of the Classes. 10 Feb 2009 RWood BBeB/LRF Books 0 02-10-2009 08:44 PM


All times are GMT -4. The time now is 05:17 AM.


MobileRead.com is a privately owned, operated and funded community.