Quote:
Originally Posted by Starson17
I looked at the character encoding specified in the HTTP headers for an article I followed. In fact, here it is again for the first article listed in the feed:
Code:
Content-Type: text/html; charset=ISO-8859-1
...
In fact, part of the reason I looked was to see how you might have come up with the answer. I don't often have character encoding problems (mostly I work in English), so I was wondering if you'd found your answer in source or HTTP headers.
|
I looked at the source for the initial feed used in the recipe,
http://veleno.inter.it/aas/rss/index_full_it.xml, fuller extract follows:
Code:
<?xml version="1.0" encoding="ISO-8859-15"?>
<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
xmlns:atom="http://www.w3.org/2005/Atom"
>
<channel>
<title>INTER.IT - IT full</title>
<link>http://www.inter.it/</link>
<language>it</language>
<description>Le notizie ufficiali di inter.it</description>
<copyright>Copyright 2010 Football Club Internazionale Milano Spa</copyright>
<atom:link href="http://www.inter.it/aas/rss/index_full_it.xml" rel="self" type="application/rss+xml" />
<item>
<dc:date>2011-03-26T00:10:23+01:00</dc:date><title>Inter Channel: "7 su 7" e non solo...</title>
<description><![CDATA[<img src="http://www.inter.it/aas/img/143867.jpg"><br><br><p><strong>APPIANO GENTILE</strong> - Non perdere gli appuntamenti odierni con il canale tematico nerazzurro: si comincia con la <em>Rassegna Stampa</em>, alle ore 13.30, a cura di Nagaja Beccalossi, mentre alle 19.30 l'appuntamento è con <em>Internews</em>, in studio Alessandro Villa.</p>
<p>Inoltre, alle 17 e in replica alle 23, torna "7 su 7", la rubrica a cura della redazione che ci riassume i fatti principali dal 19 marzo ad oggi.</p>
<p> </p><br><br>]]></description>
<link>http://www.inter.it/aas/news/reader?N=52072&L=it</link>
<guid>http://www.inter.it/aas/news/reader?N=52072&L=it</guid>
</item>
<item>
<dc:date>2011-03-25T23:03:34+01:00</dc:date><title>Thiago Motta: "Grazie Italia, cosė sono felice"</title>
...
although the Page Info for this page shown by FireFox using the right-click context menu is UTF-8. When I follow links I do find that the Page Info indicates ISO-8859-1, although the actual source contains no encoding declaration.
I'm more inclined to trust the explicit declaration as ISO-8859-15 in the initial page, and to assume that subsequent pages will have been produced using the same encoding. The encoding reported in the headers will depend on the server configuration, and may or may not be reliable.