View Single Post
Old 01-01-2024, 07:28 AM   #1
wurbl
Junior Member
wurbl began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2024
Device: Kindle
The Times and Sunday Times UK

There are several issues with the recipe for scraping this newspapers, as I think there have been some changes to the way the website works/and is structured.

(1) not including the full article, (2) random bold writing saying 'Sponsored', (3) the related articles section should be removed or reformatted, (4) duplication and wrongly formatted byline and date, (5) separating the byline from the article summary, (6) separating and distinguishing the caption in italics

The main issue is not including the full article, which I think is because they have changed their login page from 'login.thetimes.co.uk' to 'account.thetimes.co.uk'; which makes it harder to scrape. The other issues can probably be solved by updating the recipe to solve the formatting issues, but I am not familiar with this. Has anyone made a fix for any of these probems?
wurbl is offline   Reply With Quote