View Single Post
Old 08-29-2010, 01:38 AM   #2552
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Find url where text =

If there is documentation on this I wasn't able to find it, so could someone help me out please. I wanna parse this website that doesn't have an rss feed. But it has a link under each of the articles to read the full article.
Code:
<a href="/blogs/hunting/2010/08/guest-blog-5-reasons-plant-food-plots-now">Read Full Post</a>
So my question is how could I conduct a search for "Read Full Post" and get the href ?

My thoughts were something along the line of
Spoiler:
Code:
 def preprocess_html(soup)
    for link in soup.findall('a')
       if link['href'] and len(link['href']>0:
         found_link[1] = link['href']
    Return found_link


so if i have links
Code:
 1. <a href="/blogs/test1">Read Full Post </a>
 2. <a href="/blogs/test2">Read Full Post </a>
Then I would then only process the links of /blogs/test1 and /blogs/test2

thanks for the help
TonytheBookworm is offline