https://github.com/kovidgoyal/calibr...s/hindu.recipe
in line 80
div = soup.find('section', attrs={'id': 'section_'})
change it to
div = soup.find('section', attrs={'id': 'section_1'})
I changed it to _1 and it fetches article links from each section
It loads the whole article text but it doesn't load images like before! looks like they changed that part too.
This is the present code to fetch images (line 49)
Code:
def preprocess_html(self, soup):
img = soup.find('img', attrs={'class': 'lead-img'})
try:
for i, source in enumerate(tuple(img.parent.findAll('source', srcset=True))):
if i == 0:
img['src'] = source['srcset'].split()[0]
source.extract()
except Exception:
pass
and this is where the image is
Code:
<img class="lead-img" src="https://www.thehindu.com/todays-paper/1cezy4/article65041871.ece/alternates/FREE_660/First-ever-wate%2BGSE9G8VFI.3.jpg.jpg"
data-src-template="https://www.thehindu.com/todays-paper/1cezy4/article65041871.ece/alternates/FREE_660/First-ever-wate%2BGSE9G8VFI.3.jpg.jpg"
data-original="https://www.thehindu.com/todays-paper/1cezy4/article65041871.ece/alternates/FREE_660/First-ever-wate%2BGSE9G8VFI.3.jpg.jpg"
alt="Winged visitors making a splash at a lake in city.NAGARA GOPAL"
title="Winged visitors making a splash at a lake in city.NAGARA GOPAL"
data-device-variant="FREE~FREE~FREE~FREE"
width="100%" height="100%">
I changed srcset with data-original but it isn't working. Its a bit complex for me.