Quote:
Originally Posted by Starson17
Yes, it's that one.
Code:
dict(name='div', attrs={'id':'vxFlashPlayer'})
will remove it.
|
Sorted that, Also sorted the £ showing up as Ł it was
Code:
encoding= 'iso-8859-1'
Tweaked a few more bits, got the main picture to show up, ok it shows up at the end, but its there.
Does the order you put the keep tags affect the order they show up?
Spoiler:
class AdvancedUserRecipe1268409464(BasicNewsRecipe):
title = u'The Sun'
__author__ = 'Chaz Ralph'
description = 'News from The Sun'
oldest_article = 1
max_articles_per_feed = 100
no_stylesheets = True
extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
charset = 'iso-8859-1'
encoding= 'iso-8859-1'
remove_javascript = True
keep_only_tags = [
dict(name='div', attrs={'class':'medium-centered'})
,dict(name='div', attrs={'class':'article'})
,dict(name='div', attrs={'class':'clear-left'})
,dict(name='div', attrs={'class':'text-center'})
]
remove_tags = [dict(name='div', attrs={'class':'slideshow'})
,dict(name='div', attrs={'class':'float-left'})
,dict(name='div', attrs={'class':'ltbx-slideshow ltbx-btn-ss'})
,dict(name='a', attrs={'class':'add_a_comment'})
,dict(name='div', attrs={'id':'vxFlashPlayerContent'})
,dict(name='div', attrs={'id':'k1006094r1c1t5w380h529'})
,dict(name='div', attrs={'id':'tum_login_form_container'})
,dict(name='div', attrs={'class':'discHeader'})
,dict(name='div', attrs={'class':'margin-bottom-neg-2'})
]
feeds = [(u'News', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article312900.ece')
,(u'Sport', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247732.ece')
,(u'Football', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247739.ece')
,(u'Gizmo', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247829.ece')
,(u'Bizarre', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247767.ece')]
def print_version(self, url):
url.replace('?OTC-RSS&ATTR=News', '?print=yes')
url.replace('?OTC-RSS&ATTR=Royals', '?print=yes')
url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes')
url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes')
url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes')
url.replace('?OTC-RSS&ATTR=Football', '?print=yes')
url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes')
url.replace('?OTC-RSS&ATTR=Tv', '?print=yes')
url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes')
url.replace('?OTC-RSS&ATTR=Usa', '?print=yes')
url.replace('?OTC-RSS&ATTR=Film', '?print=yes')
url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
return url
that's the updated recipe.
I've been playing with firebug and also installed Python 2.6 and been learning a little of that