Quote:
Originally Posted by cnfmsu
Do you have an existing recipe doing this ?
|
I'm working on a recipe at the moment that needs to do this. Here's what I've got:
Code:
# pre-process HTML: Some links are relative. Change them to absolute
def preprocess_html(self, soup):
for link in soup.findAll('a'):
if link['href'].startswith('http') == False:
# Link is relative. Make it absolute
link['href'] = 'http://www.example.com/' + link['href']
return soup
You'll need to change
http://www.example.com/ to whatever URL your site uses.