Quote:
Originally Posted by TonytheBookworm
What might be wrong with this?
Code:
def preprocess_html(self, soup):
for article in table.findAll('table') :
my understanding of the above is it should find all instances of the <table> tag and then take and look inside that for the https and http links specified. If it finds either of them it will extract it from the soup. otherwise it will continue on. then return the soup without those links yet that doesn't happen 
|
It should be
Code:
def preprocess_html(self, soup):
for article in soup.findAll('table') :
Otherwise, you are looking for table tags inside "table"