View Single Post
Old 08-12-2016, 07:08 AM   #80
mcdummy
Connoisseur
mcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the roughmcdummy is a jewel in the rough
 
mcdummy's Avatar
 
Posts: 73
Karma: 7130
Join Date: Apr 2015
Device: PRS-T3
Concerning the correction of incomplete (relative) references, it seems that the current version only corrects links in <a>-tags but not in <area> or <link>-tags.

The links in these tags could be corrected by adding

# <area> references
for area in soup.findAll('area', href=lambda x: x and x.startswith('/')):
href = area['href']
if href.startswith('//'):
area['href'] = 'https:' + href
elif url_prefix:
area['href'] = url_prefix + area['href']
# <link> references
for link in soup.findAll('link', href=lambda x: x and x.startswith('/')):
href = link['href']
if href.startswith('//'):
link['href'] = 'https:' + href
elif url_prefix:
link['href'] = url_prefix + link['href']

in wikipedia.recipe

after the code

for a in soup.findAll('a', href=lambda x: x and x.startswith('/')):
href = a['href']
if href.startswith('//'):
a['href'] = 'https:' + href
elif url_prefix:
a['href'] = url_prefix + a['href']

I guess the code could be simplified, but it is just a quick workaround for the time being.

McDummy
mcdummy is offline   Reply With Quote