View Single Post
Old Today, 11:17 AM   #5
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 330
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
This is weird ...

The same issue arises with Javascript-based retrieval, although the response seen is "forbidden." Of course the result is also blank, so I suspect Python is not reporting the forbidden status.

The weird thing is, if the url is constructed rater than extracted from the JSON structure, the image is retrieved successfully. I modified the economist recipe as follows.

Code:
            self.cover_url = (
                safe_dict(data, 'props', 'pageProps', 'content', 'cover', 'url')
                .replace(
                    'economist.com/',
                    'economist.com/cdn-cgi/image/width=960,quality=80,format=auto/',
                )
                .replace('SQ_', '')
            )
            self.log('Got embedded cover:', self.cover_url)
            #from datetime import datetime
            #issueDate = datetime.fromisoformat(safe_dict(data, 'props', 'pageProps', 'content', 'issueDate').replace('Z', '+00:00')).strftime("%Y%m%d")
            #self.cover_url = 'https://www.economist.com/cdn-cgi/image/width=960,quality=80,format=auto/content-assets/images/' + issueDate + '_DE_US.jpg'
            #self.log('Got constructed cover:', self.cover_url)
As expected, the cover image does not load. However, if I uncomment the code to get the constructed url, it works, even though the urls appear to be the same. Here is the log output:

Code:
Got embedded cover: https://www.economist.com/cdn-cgi/image/width=960,quality=80,format=auto/content-assets/images/29250920_DE_US.jpg
Got constructed cover: https://www.economist.com/cdn-cgi/image/width=960,quality=80,format=auto/content-assets/images/20250920_DE_US.jpg
I get the same results using Javascript. The string lengths are the same (so there are no hidden characters corrupting the embedded url).

Very strange indeed. If anyone has a theory as to what's happening, let's hear it.
nickredding is offline   Reply With Quote