I wondered whether the server might be misreporting the file type (and whether that would lead to the error I'm seeing), so I ran Fiddler and went to the image's URL in my browser. Here's the headers and beginning of the data:
Code:
HTTP/1.1 200 OK
Eomportal-Instance: 15
Last-Modified: Wed, 28 Mar 2012 04:52:44 GMT
Cache-Control: max-age=86400, must-revalidate
Content-Type: image/jpeg
Content-Length: 133218
Date: Wed, 28 Mar 2012 16:40:51 GMT
Connection: close
Server: BostonGlobe.com Frontend
�����JFIF���������,Photoshop 3.0�8BIM������,����,�������C�
The server is correctly identifying the image as a JPEG. So, to sum up what we know so far:
1. This is a valid JPEG file
2. The server correctly identifies its type
3. The file's extension incorrectly identifies its type
4. PIL.Image handles the file correctly as a JPEG
5. Using calibre-debug to execute a script that fetches the file and loads it using PIL.Image succeeds
Incidentally, it does
not appear that you need to have a Boston Globe subscription to fetch this file - the get_image script that I posted earlier doesn't do any login.