View Single Post
Old 10-12-2010, 06:08 PM   #5
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
I see in a future release of this recipe the need to extract all the Video:.... Links. If someone else finds the time (I am swamped with work but just chiming in when i'm free), please feel free to modify the code.
The basic thing that needs to be done is this.
Spoiler:

Code:
def preprocess_html(self, soup) :
      '''
         need to find the structure of the links and what their tags are
         apply them to the findAll
      ''''
      weblinks = soup.findAll(['PUT THE TAGS HERE SEPERATED BY COMMA'])
      if weblinks is not None:
            for link in weblinks:
                if re.search('(Video)(:)',str(link)):
                  
                  link.parent.extract()
        return soup
TonytheBookworm is offline   Reply With Quote