View Single Post
Old 03-14-2010, 02:22 PM   #1600
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Ekips View Post
This is my first ever attempt at python so excuse the roughness.
I'm a beginner, too. Kovid's been riding herd on my efforts, but I'll see if I can help you.

Your recipe looks pretty good. Minor cleanup: You might want to change the def print_version to this:
Code:
    def print_version(self, url):
          url.replace('?OTC-RSS&ATTR=News', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Royals', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Football', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Tv', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Usa', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Film', '?print=yes')
          url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
          return url
Each replace() just modifies url, so you can do them sequentially in the body, and return url instead of doing a single modification of url in the return line.


I ran the recipe in test mode, so I only pulled two feeds with two articles each. I didn't see any references to Flash. I did see some text "Advertisement" and some "Add a Comment" links that were left. Can you tell me exactly what feed/article you want help on?

Add this to your remove_tags to kill the "Add a Comment" :
Code:
,dict(name='a', attrs={'class':'add_a_comment'})
Do you know the best way to find these?

Use Firefox,
install the Firebug add-on,
open the page you're having trouble with,
find the item you want to remove on the original page (CTRL-F),
right click that item and select "Inspect Element"

It tells you the name, and id or class label of the element.
Then just put that into your remove_tag list.

The "Add a Comment" junk was in an <a> tag with id='addComment' and class= 'add_a_comment'. You could pull it with reference to either the id or the class.

Also, you can condense your 3 removes into one. Here is the line:
Code:
dict(name='div', attrs={'class':['slideshow','float-left','ltbx-slideshow ltbx-btn-ss']})
The 3 keeps can be condensed the same way.

Last comment - I usually add "remove_javascript = True" unless there's some reason not to use it.

Last edited by Starson17; 03-14-2010 at 02:24 PM.
Starson17 is offline