Thread: RSS and CSS ?
View Single Post
Old 10-28-2009, 09:16 PM   #8
zelda_pinwheel
zeldinha zippy zeldissima
zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.
 
zelda_pinwheel's Avatar
 
Posts: 27,827
Karma: 921169
Join Date: Dec 2007
Location: Paris, France
Device: eb1150 & is that a nook in her pocket, or she just happy to see you?
okay, my recipe is getting pretty close to done i think. i've removed the stuff i don't want and i still have the css formatting. i just have 2 questions.

1. in the css, the main body content div has a width of 520px. is there a way to get rid of just the width rules from the css ? (there are a few others but that one is the most problematic.) i saw in your sample recipe that you can selectively remove certain html tags with postprocessing, but this doesn't seem to work for the css code (i tried). i know i *could* remove the style sheets and then add back all the css without the width, but that seems like a rather clumsy solution.

2. i set the recipe to grab 7 days worth of articles. however, it seems to have grabbed the most recent article 7 times instead, and no previous ones. can i fix that somehow ?

for the moment the recipe looks like this (i copied the "soup" bits directly out of the sample recipe, i only modified the one in the middle to try to get rid of the "width" styles. i am perfectly happy to also turn all tables into divs if there are any left over) :

Code:
class AdvancedUserRecipe1256774004(BasicNewsRecipe):
    title          = u'World Wide Words 5'
    oldest_article = 7
    max_articles_per_feed = 100
    use_embedded_content=False

    feeds          = [(u'Magazine', u'http://www.worldwidewords.org/rss/newsletter.xml')]

    remove_tags    = [dict(name='div', attrs={'class':'navbar1'}),
    	          {'class': ['logo-wide','navhead','navlink1',]},
                        	]


    keep_only_tags = [dict(name='div', attrs={'class':'bodyblock'})]

    def postprocess_html(self, soup, first):

            for tag in soup.findAll(name= 'img', alt=""):
                    tag.extract()

            for item in soup.findAll(name=['width']):
                del item['width']

            for tag in soup.findAll(name=['table', 'tr', 'td']):
                tag.name = 'div'

            return soup
and i'm attaching the latest results.
Attached Files
File Type: epub World Wide Words 5 [jeu., 29 oct. 2009] - calibre.epub (254.2 KB, 215 views)
zelda_pinwheel is offline   Reply With Quote