MobileRead Forums - View Single Post

nelson1379 · 02-10-2018, 06:17 AM

Sorry to keep posting, but the non web_edition scraping mechanism isn't reading the today's edition webpage correctly -- it correctly puts the first four articles in the "Front Page" section, but then it seems to skip over the rest of the "Front Page" section and puts all of the rest of the articles into the "International" section.

I'm not sure what it is in the html that is confusing the script in between the top four articles and the rest -- they're obviously formatted different visually but there's no h1 section between Front Page and International that the script is reading. I don't know Python but I've been staring at it for a little while trying to figure it out... Perhaps it's something about that "rank-template featured-rank-template template-2 issue-template" div that contains only the first four "Front Page" articles that's messing it up. Sorry I can't be more helpful.

02-10-2018, 06:17 AM	#12
nelson1379 Enthusiast Posts: 31 Karma: 32 Join Date: Jan 2012 Device: Kindle Paperwhite	Sorry to keep posting, but the non web_edition scraping mechanism isn't reading the today's edition webpage correctly -- it correctly puts the first four articles in the "Front Page" section, but then it seems to skip over the rest of the "Front Page" section and puts all of the rest of the articles into the "International" section. I'm not sure what it is in the html that is confusing the script in between the top four articles and the rest -- they're obviously formatted different visually but there's no h1 section between Front Page and International that the script is reading. I don't know Python but I've been staring at it for a little while trying to figure it out... Perhaps it's something about that "rank-template featured-rank-template template-2 issue-template" div that contains only the first four "Front Page" articles that's messing it up. Sorry I can't be more helpful.