MobileRead Forums - View Single Post - Fetch South China Morning Post Magazine

dmiming · 09-07-2024, 04:58 AM

I have been using the scmp.recipe in the recipe to scrape the South China Morning Post for several months now, but recently some issues have started to arise. A brief summary is as follows:

Incomplete Content: The content of each document is not fully retrieved, with some parts missing. Upon checking the source feeds (e.g., https://www.scmp.com/rss/2/feed), it appears that, much like the situation with The Economist Espresso servral months ago, the full content is not displayed. I’m uncertain if there is any other way to resolve this.

Invalid Content: The scraped content often contains irrelevant entries such as "Advertisement." I wonder if there is a way to filter such content during the scraping process.

09-07-2024, 04:58 AM	#1
dmiming Member Posts: 14 Karma: 10 Join Date: Sep 2024 Device: kindle oasis 2	Fetch South China Morning Post Magazine - incomplete content I have been using the scmp.recipe in the recipe to scrape the South China Morning Post for several months now, but recently some issues have started to arise. A brief summary is as follows: Incomplete Content: The content of each document is not fully retrieved, with some parts missing. Upon checking the source feeds (e.g., https://www.scmp.com/rss/2/feed), it appears that, much like the situation with The Economist Espresso servral months ago, the full content is not displayed. I’m uncertain if there is any other way to resolve this. Invalid Content: The scraped content often contains irrelevant entries such as "Advertisement." I wonder if there is a way to filter such content during the scraping process.