View Single Post
Old 09-22-2010, 01:27 PM   #2811
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
Starson17,
Do you mind looking at this when you get a sec and telling me what the heck I'm doing wrong as far as the css is concerned please?
What my objective is, is to change this
Code:
Egypt’s housing market recovers!
it has the tag format of <div class="cdmainarticle">Egypt’s housing market recovers!</div>
So based on what I have gathered from other recipes and from you my code "should reformat it", but it doesn't.

Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class GlobalProperty(BasicNewsRecipe):
    title = 'Global Property Guide'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'This is a site for residential property investors who want to buy houses or apartments in other countries'
    publisher = 'GlobalPropertyGuide.com'
    category = 'prices,real-estate'
    oldest_article = 10
    max_articles_per_feed = 100
    no_stylesheets = True
    extra_css = 'div.cdmainarticle{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}'
    
    keep_only_tags    = [
                         dict(name='div', attrs={'class':['cd_mainbody']})
                        ]
    remove_tags = [
                   dict(name='div', attrs={'class':['addthis_toolbox addthis_default_style']}),
                   
                  ]                    
    feeds          = [
                      ('Main Feed', 'http://www.globalpropertyguide.com/rss'),
                      
                    ]
1) It has class="cd_mainarticle", not class="cdmainarticle",
2) It has inline style on your header. Strip that first:
Code:
    def preprocess_html(self, soup):
        for item in soup.findAll(attrs={'style':True}):
            del item['style']
        return soup
Try this one:
Spoiler:
Code:
from calibre.web.feeds.news import BasicNewsRecipe

class GlobalProperty(BasicNewsRecipe):
    title = 'Global Property Guide'
    language = 'en'
    __author__ = 'TonytheBookworm, with a little help from his friends'
    description = 'This is a site for residential property investors who want to buy houses or apartments in other countries'
    publisher = 'GlobalPropertyGuide.com'
    category = 'prices,real-estate'
    oldest_article = 10
    max_articles_per_feed = 100
    no_stylesheets = True
    extra_css = '''
                 .cd_mainarticle{font-family:Arial,Helvetica,sans-serif; color:red; font-weight:bold;font-size:large;}
                 '''    
    keep_only_tags    = [
                         dict(name='div', attrs={'class':['cd_mainbody']})
                        ]
    remove_tags = [
                   dict(name='div', attrs={'class':['addthis_toolbox addthis_default_style']}),
                   
                  ]                    
    feeds          = [
                      ('Main Feed', 'http://www.globalpropertyguide.com/rss'),
                      
                    ]
    def preprocess_html(self, soup):
        for item in soup.findAll(attrs={'style':True}):
            del item['style']
        return soup

Edit: (I made the header red to spot the change easily.)

Last edited by Starson17; 09-22-2010 at 01:37 PM.
Starson17 is offline