Quote:
Originally Posted by Starson17
I'm a beginner, too. Kovid's been riding herd on my efforts, but I'll see if I can help you.
Your recipe looks pretty good. Minor cleanup: You might want to change the def print_version to this:
Code:
def print_version(self, url):
url.replace('?OTC-RSS&ATTR=News', '?print=yes')
url.replace('?OTC-RSS&ATTR=Royals', '?print=yes')
url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes')
url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes')
url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes')
url.replace('?OTC-RSS&ATTR=Football', '?print=yes')
url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes')
url.replace('?OTC-RSS&ATTR=Tv', '?print=yes')
url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes')
url.replace('?OTC-RSS&ATTR=Usa', '?print=yes')
url.replace('?OTC-RSS&ATTR=Film', '?print=yes')
url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
return url
Each replace() just modifies url, so you can do them sequentially in the body, and return url instead of doing a single modification of url in the return line.
I ran the recipe in test mode, so I only pulled two feeds with two articles each. I didn't see any references to Flash. I did see some text "Advertisement" and some "Add a Comment" links that were left. Can you tell me exactly what feed/article you want help on?
Add this to your remove_tags to kill the "Add a Comment" :
Code:
,dict(name='a', attrs={'class':'add_a_comment'})
Do you know the best way to find these?
Use Firefox,
install the Firebug add-on,
open the page you're having trouble with,
find the item you want to remove on the original page (CTRL-F),
right click that item and select "Inspect Element"
It tells you the name, and id or class label of the element.
Then just put that into your remove_tag list.
The "Add a Comment" junk was in an <a> tag with id='addComment' and class= 'add_a_comment'. You could pull it with reference to either the id or the class.
Also, you can condense your 3 removes into one. Here is the line:
Code:
dict(name='div', attrs={'class':['slideshow','float-left','ltbx-slideshow ltbx-btn-ss']})
The 3 keeps can be condensed the same way.
Last comment - I usually add "remove_javascript = True" unless there's some reason not to use it.
|
Thanks for that, cleaned it up a fair bit, code looks trim too.
There's a few that come back with the
Quote:
You need Flash Player 8 or higher to view video content with the ROO Flash Player. Click here to download and install it.
|
http://www.thesun.co.uk/sol/homepage...&ATTR=Football
that one for example.
I think its
Code:
<div id="vxFlashPlayer"><div id="vxFlashPlayerContent" style="width: 380px; height: 278px;">
that is doing it, I'm going to try removing that one and let it run.
And a few are coming back as blank, and the £ is coming up as Ł. so I still have some tweaking to do, but I'm finding it interesting (and very distracting)
How do you run the recipe in test mode? I've been running the thing in calibre and downloading the full feeds, takes ages each time